Link to solutions
Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
First
Semester 2016-2017
Mid-Semester
Test (EC-2 Regular)
Course No. : SS ZG537
Course Title : INFORMATION RETRIEVAL
Nature of Exam : Closed Book
Weightage : 30%
Duration : 2 Hours
Date
of Exam : 24/09/2016 (FN)
No
of pages: 1
No
of questions: 6
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q.1.
Consider
the table below showing how two users rated the relevance of a set of 12
documents to a particular information need (0 = non-relevant and 1 = relevant).
Assume that you have developed an IR system that for this query returns the set
of documents (4, 5, 6, 7, 8).
doc-id 1 2 3 4 5 6 7 8 9 10 11 12
user-1 0 0 1 1 1 1 1 1 0 0 0 0
user-2 0 0 1 1 0 0 0 0 1 1 1 1
(a)
Calculate
the precision, recall and F-measure of your system if a document is considered
relevant when both users are agreed. [3]
(b)
Calculate
the precision, recall and F-measure of your system if a document is considered
relevant when either user thinks it is relevant. [3]
Q.2.
Compute
the purity of the following table (for each cluster and for overall clustering)
having four categories of a dataset. [4]
arts
|
business
|
computer
|
home
|
|
cluster
1
|
10
|
05
|
15
|
20
|
cluster
2
|
10
|
10
|
10
|
10
|
cluster
3
|
10
|
10
|
10
|
20
|
Q.3.
Compute
the similarity-oriented measure using Rand
coefficient for the two clusters and classes having d1, d2,
d3, d4 and d5 as the documents. Cluster C1
= {d3, d4}, C2 = {d1, d2,
d5} Class K1= {d1, d2, d4},
K2 = {d3, d5}. [4]
Q.4 (a)
What
is the posting list that can be decoded from the following variable byte-code? [3]
10001001 00000001 10000010 11111111
Q.4 (b)
What
would be the encoding of the same posting list using a gamma-code? [3]
Q.5 (a)
Suppose
a program for recognizing dogs in scenes from a video identifies 9 dogs in a
scene containing 11 dogs and some cats. If 4 of the identifications are
correct, but 5 are actually cats, then compute the precision and recall of the
program. [4]
Q.5 (b)
What
are the differences between single pass
in memory indexing and block sort
based indexing? [2]
Q.6.
Consider
a collection made of 800 documents and the number of unique words is estimated
to 800. The following things are required for the dictionary storage assuming
that all the terms are stored as a string: 4 bytes per term frequency, 4 bytes
for term pointer to postings, 3 bytes for term pointer and average 8 bytes for
term in term string. Estimate the space usage for dictionary without blocking
and with block size of k = 8. [4]
**********
No comments:
Post a Comment