Link to solutions
Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
Second
Semester 2016-2017
Mid-Semester
Test
(EC-2
Regular)
Course No. : SS ZG537
Course Title : INFORMATION RETRIEVAL
Nature of Exam : Closed Book
Weightage : 30%
Duration : 2 Hours
Date
of Exam : 25/02/2017 (FN)
No
of pages: 2
No
of questions: 7
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q.1.
Discuss in brief the limitations of the Boolean
retrieval model. [2]
Q.2.
Give the name of the index we need to use if [1 + 1 + 2
= 4]
(a)
We want to consider word order in the queries and the documents
for a random number of words?
(b)
What kind of Index can we use if we assume that word order is only
important for two consecutive terms?
(c)
What is the soundex code for the following two names, Robert and
Rupert? Assume that the alphabets are mapped to numbers as follows: (B, F, P, V ® 1), (C, G, J, K, Q, S, X, Z ® 2
), (D,T ® 3), (L ® 4), (M, N ® 5) and (R ® 6) .
Q.3.
Discuss
briefly the index construction algorithm used in Distributed Indexing with a
suitable diagram.
[5]
Q.4 (a)
An
IR system returns 8 relevant documents, and 10 non-relevant documents. There
are a total of 20 relevant documents in the collection. What is the precision
of the system on this search, and what is its recall?
Q.4 (b)
What
is the likely effect of ‘Stemming’ and ‘Lemmatization’ on
i.
Vocabulary
size: Increase, Decrease, Unpredictable?
ii.
Precision:
Increase, Decrease, Unpredictable?
iii.
Recall: Increase, Decrease, Unpredictable? [2 + 3 = 5]
Q.5.
Consider
the following documents: [1 + 2 = 3]
Doc1: catholic
church in brisbane
Doc2: garden
city church brisbane
Doc3: brisbane courier garden
city
Doc4: where in brisbane catholic church
(a)
Draw
a term-document incidence matrix for this document collection.
(b)
Draw
the positional inverted index representation for this collection.
SS ZG537 (EC-2 Regular) Second Semester 2016-2017 Page 2
Q.6.
Consider the following document: “The universe contains many
different universities”
[1 + 2 + 3 + 2 = 8]
(a)
How many entries a bigram index would
contain?
(b)
If a boolean query of answering is used on
this index for the initial query uni*, what terms would you search in this
permuterm index?
(c)
How do you process queries such as univ*,uni*rse,uni*e*se by
using the permuterm index? Show what terms will you search for and how?
(d)
Use the 2-gram index and 3-gram index for processing the
following wildcard queries tol* and rea* . Is "tool" result for the
wildcard query tol* ? If the answer is yes, solve this problem.
Q.7.
Assume that Simple term frequency weights are used (with
no IDF factor), and the stop words “is”, “am” and “are” are removed. Compute
the cosine similarity of the following two documents: [Show the term frequency matrix] [3]
Doc1: Precision is very very high”
Doc2: “high precision is very very very important”
***********
No comments:
Post a Comment