BITS WILP Information Retrieval Mid-Sem Exam 2016-H2 (Regular)



Link to solutions

Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division
First Semester 2016-2017
Mid-Semester Test (EC-2 Regular)

Course No.                  : SS ZG537 
Course Title                 : INFORMATION RETRIEVAL  
Nature of Exam           : Closed Book
Weightage                    : 30%
Duration                      : 2 Hours 
Date of Exam              : 24/09/2016    (FN)
No of pages: 1
No of questions: 6
Note:
1.       Please follow all the Instructions to Candidates given on the cover page of the answer book.
2.       All parts of a question should be answered consecutively. Each answer should start from a fresh page. 
3.       Assumptions made if any, should be stated clearly at the beginning of your answer.

Q.1.        Consider the table below showing how two users rated the relevance of a set of 12 documents to a particular information need (0 = non-relevant and 1 = relevant). Assume that you have developed an IR system that for this query returns the set of documents (4, 5, 6, 7, 8).
doc-id  1         2          3          4          5          6          7          8          9          10        11        12
user-1   0         0          1          1          1          1          1          1          0          0          0          0
user-2   0         0          1          1          0          0          0          0          1          1          1          1
(a)             Calculate the precision, recall and F-measure of your system if a document is considered relevant when   both users are agreed.                                                                                                       [3]
(b)             Calculate the precision, recall and F-measure of your system if a document is considered relevant when either user thinks it is relevant.                                                                                            [3]

Q.2.        Compute the purity of the following table (for each cluster and for overall clustering) having four categories of a dataset.                                                                                                                      [4]     

arts
business
computer
home
cluster 1
10
05
15
20
cluster 2
10
10
10
10
cluster 3
10
10
10
20

Q.3.        Compute the similarity-oriented measure using Rand coefficient for the two clusters and classes having d1, d2, d3, d4 and d5 as the documents. Cluster C1 = {d3, d4}, C2 = {d1, d2, d5} Class K1= {d1, d2, d4}, K2 = {d3, d5}.                                                                                                [4]

Q.4 (a)          What is the posting list that can be decoded from the following variable byte-code?                 [3]
                 10001001 00000001 10000010 11111111   
Q.4 (b)          What would be the encoding of the same posting list using a gamma-code?                           [3]

Q.5 (a)          Suppose a program for recognizing dogs in scenes from a video identifies 9 dogs in a scene containing 11 dogs and some cats. If 4 of the identifications are correct, but 5 are actually cats, then compute the precision and recall of the program.                                                           [4]
Q.5 (b)          What are the differences between single pass in memory indexing and block sort based indexing?                                                                                                                          [2]

Q.6.        Consider a collection made of 800 documents and the number of unique words is estimated to 800. The following things are required for the dictionary storage assuming that all the terms are stored as a string: 4 bytes per term frequency, 4 bytes for term pointer to postings, 3 bytes for term pointer and average 8 bytes for term in term string. Estimate the space usage for dictionary without blocking and with block size of k = 8.                                                              [4]

**********

No comments:

Post a Comment