BITS WILP
Information Retrieval Handout 2017-H2
BIRLA INSTITUTE OF
TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED
LEARNING PROGRAMMES
Digital Learning
Part A: Content Design
This Part A: Content Design documents a high level design of the
course. It will iteratively evolve under the experienced hands of the course
authors and will be become a WILP standard document for this course across all
instances of its offerings.
Course Title
|
Information Retrieval
|
Course No(s)
|
SS ZG537
|
Credit Units
|
4
|
Credit Model
|
Unit split between Class
Hours + Lab/Design/Fieldwork + Student preparation
Ex.
1-1-2, (total 4 units or credits) ie 1 unit for class room hours, 1 unit for
lab hours, 2 units for student preparation. Typically 1 unit translates to 32
hours
|
Course Author
|
Rajendra Kumar Roul
|
Version No
|
V1.0
|
Date
|
|
Course Objectives
No
|
Course
Objective
|
CO1
|
To understand
structure and organization of various components of an IR system
|
CO2
|
To understand
information representation models, term scoring mechanisms, etc. in the
complete search system
|
CO3
|
To understand
architecture of search engines, crawlers and the web search
|
CO4
|
To understand cross
lingual retrieval and multimedia information retrieval
|
CO5
|
To
understand the concepts of Recommender Systems.
|
Text Book(s)
T1
|
C.
D. Manning, P. Raghavan and H. Schutze. Introduction to Information
Retrieval,
|
Reference Book(s) & other resources
R1
|
Modern Information Retrieval,
Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley, 2000. http://people.ischool.berkeley.edu/~hearst/irbook/http://people.ischool.berkeley.edu/~hearst/irbook/
|
R2
|
Ricci,
F.; Rokach, L.; Shapira, B.; Kantor, P.B. (Eds.), Recommender Systems Handbook.
1st Edition., 2011, 845 p. 20 illus., Hardcover, ISBN: 978-0-387-85819-7
|
R3
|
Cross-Language Information
Retrieval by By Jian-Yun Nie Morgan & Claypool Publisher series 2010
|
R4
|
Multimedia Information Retrieval by Stefan
M. RĂ¼ger Morgan & Claypool Publisher series 2010.
|
R5
|
Information Retrieval:
Implementing and Evaluating Search Engines by S. Buttcher, C. Clarke and G.
Cormack, MIT Press, 2010.
|
R6
|
Web
Data Mining: Exploring Hyperlinks, Contents, and Usage Data by B. Liu,
Springer, Second Edition, 2011.
|
R7
|
Search Engines: Information
Retrieval in Practice by Bruce Croft, Donald Metzler, and Trevor Strohman,
Addison-Wesley, 2009.
|
R8
|
Koehn
P., “Statistical Machine Translation”,
|
Modular Content
Structure
1.
Introduction
1.1.
Information Retrieval
1.2.
Basic Search Model
2.
Basic Information Retrieval Concepts
2.1.
Boolean Retrieval
2.2.
Dictionaries and Tolerant Retrieval
2.3.
Index Construction and Compression
3.
Vector Space Model
3.1.
Scoring, Term Weighting
3.2.
The Vector Space Model for Scoring
4.
Text Mining
4.1.
Text Classification
4.2.
Vector Space Classification
4.3.
Text Clustering
5.
Cross Lingual Retrieval
5.1.
Language Problems in IR
5.2.
Translation
Approaches for CLIR
6.
Recommender Systems
6.1.
Collaborative
recommendation
6.2.
Content
based recommendation
7.
Multimedia Information Retrieval
7.1.
Multimedia
search technologies
7.2.
Content
based retrieval
7.3.
Image and Audio data challenges
7.4.
Multimedia IR Research
8.
Web Search
8.1.
Web Search Basics
8.2.
Web Crawlers and Indexes
8.3.
Link Analysis
Learning Outcomes:
No
|
Learning Outcomes
|
LO1
|
Students will gain
understanding about an information retrieval system as a whole and about its
components.
|
LO2
|
Students will have
knowledge about the design issues and their solutions of different type of
models including Boolean, vector space etc.
|
LO3
|
Students will have
detailed understanding about text indexing, mining, weighting schemes etc.
|
LO4
|
Students will
acquire knowledge about cross lingual and multimedia information retrieval.
|
LO5
|
Differentiate
between different recommender systems and suggest a suitable system based on
the problem and data available.
|
LO6
|
With the acquired
knowledge students will be able to design and build different kind of
information retrieval systems.
|
Experiential
learning components
Additional
documentation
Part B: Course Handout
Academic Term
|
First Semester 2017-2018
|
Course Title
|
Information
Retrieval
|
Course No
|
SS
ZG537
|
Lead Instructor
|
Maheshwari
K
|
Contact Hour
|
List of Topic
Title
(from content
structure in Part A)
|
Topic #
(from content structure in Part A)
|
Text/Ref
Book/external resource
|
1-2
|
●
Introduction
o
Information
Vs Data Retrieval
o
Basic
Concepts
o
The
retrieval process
o
Taxonomy
of IR
o
Classic
IR and Alternative models
|
1.1,
1.2
|
R1 Ch1, Ch2
|
3-5
|
●
Boolean
Retrieval
o
Inverted
index
o
Processing
Boolean queries
o
Term
vocabulary and postings lists
o
Phrase
queries
o
Positional indexes
|
2.1
|
T Ch2, R1 Ch2 section 5
|
6-7
|
●
Dictionary
and Tolerant Retrieval
o
Search
Structures for dictionaries
o
Wildcard
queries
o
Spelling correction
o
Edit distances
o
Phonetic
Correction
|
2.2
|
T Ch3
|
8-10
|
●
Index
Construction and Compression
o
Blocked
sort-based Indexing
o
Single
pass in-memory indexing
o
Distributed
and dynamic indexing
o
Dictionary
comparison
o
Postings
file compression
o
Weighted zone scoring
|
2.3
|
T Ch4
|
11-12
|
●
Vector
Space Model
o
Term
frequency and weighting
o
The
vector space model for scoring
o
Tf-idf
functions
o
Dot products
o
Queries as vectors
o
Variant tf-idf functions
o
Document and query weighting schemes
|
3.1,
3.2
|
T Ch6
|
13-15
|
●
Text
Mining
o
Classification
●
NaĂ¯ve
Bayes
●
Vector
space classification
●
Evaluating
Classification
o
Clustering
●
Flat
clustering
●
Hierarchical
clustering
|
4.1,
4.2, 4.3
|
T Ch13, 14, 16, 17
|
16-19
|
●
Cross
Lingual IR (CLIR)
o
Language
problems in IR
o
Translation
Approaches
o
Handling
Many Languages
o
Resources
for CLIR
|
5.1, 5.2
|
R3 Ch2,
R8 Ch4,
5, 6
|
20-25
|
●
Recommender
System
o
Collaborative
recommendation
o
Content
based recommendation
o
Other
type & hybrid recommendations
|
6.1,
6.2
|
R2 Ch1-5
|
26-29
|
●
Multimedia
IR
o
Basic
Multimedia search technologies
o
Content
Based Retrieval
o
Image and Audio data challenges
o
Multimedia IR Research
|
7.1,7.2,
7.3, 7.4
|
R4 Ch1, 2, 3
|
30-31
|
●
Web
Search
o
Web
characteristics
o
The
search user experience
o
Index
size and estimation
●
Web
Crawling and Indexes
o
Crawling
o
Crawler
Architecture
o
Distributed
Indexes
●
Link
Analysis
o
The
web as a graph
o
Google’s
page rank
o
Hub
and Authorities (HITS)
|
8.1,
8.2, 8.3
|
T Ch 19, 20, 21
|
32
|
●
Review
|
|
|
Detailed Plan for Lab work/Design work
Lab No
|
Lab Objective
|
Lab Sheet Access
URL
|
Content Reference
|
1
|
|
|
|
2
|
|
|
|
3
|
|
|
|
4
|
|
|
|
5
|
|
|
|
6
|
|
|
|
Case
studies: Detailed Plan
Case
study No
|
Case study Objective
|
Case study Sheet Access URL
|
1
|
|
|
2
|
|
|
Work
integration: Detailed plan
No
|
Activity description
(Examples are given below)
|
1
|
Apply
Domain modelling concept to the work you are doing in the work place
|
2
|
Present
the architecture of the software you are working on
|
3
|
Analyse
the test plan of the software project you are working on and identify areas
where it can be further improved
|
4
|
Seminar /
talk by Project manager in the company on a topic of relevance to the course
|
Project
work: Detailed Plan
1.
Objective
of the project:
2.
Project
scenario description:
3.
Tasks
to be performed by the students:
4.
Expected
deliverables:
5.
Duration
of the project:
Evaluation
Scheme:
Legend: EC = Evaluation Component; AN =
After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day, Date,
Session, Time
|
EC-1
|
Quiz-I
|
Online
|
-
|
5%
|
August 26 to
September 4, 2017
|
|
Quiz-II
|
Online
|
-
|
5%
|
September 26 to
October 4, 2017
|
|
Lab
|
Online
|
-
|
10%
|
October 20 to 30,
2017
|
EC-2
|
Mid-Semester
Test
|
Closed Book
|
2 hours
|
30%
|
23/09/2017 (FN)
10 AM – 12 Noon
|
EC-3
|
Comprehensive
Exam
|
Open Book
|
3 hours
|
50%
|
04/11/2017 (FN)
9 AM – 12 Noon
|
Syllabus for Mid-Semester Test (Closed
Book): Topics in Contact Hours: 1 to 16
Syllabus for Comprehensive Exam (Open
Book): All topics (Session Nos. 1 to 32)
Important links and information:
Elearn portal:
https://elearn.bits-pilani.ac.in
Students are expected to visit the
Elearn portal on a regular basis and stay up to date with the latest
announcements and deadlines.
Contact sessions:
Students should attend the online lectures as per the schedule provided on the
Elearn portal.
Evaluation Guidelines:
1.
EC-1 consists of either two Assignments
or three Quizzes. Students will attempt them through the course pages on the
Elearn portal. Announcements will be made on the portal, in a timely manner.
2.
For Closed Book tests: No books or
reference material of any kind will be permitted.
3.
For Open Book exams: Use of books and
any printed / written reference material (filed or bound) is permitted.
However, loose sheets of paper will not be allowed. Use of calculators is
permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange
of any material is not allowed.
4.
If a student is unable to appear for the
Regular Test/Exam due to genuine exigencies, the student should follow the
procedure to apply for the Make-Up Test/Exam which will be made available on
the Elearn portal. The Make-Up Test/Exam will be conducted only at selected
exam centres on the dates to be announced later.
It shall be the responsibility of the
individual student to be regular in maintaining the self study schedule as
given in the course handout, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the handout.
No comments:
Post a Comment