BITS WILP Information Retrieval Handout 2017-H2


BITS WILP Information Retrieval Handout 2017-H2

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A:  Content Design
This Part A: Content Design documents a high level design of the course. It will iteratively evolve under the experienced hands of the course authors and will be become a WILP standard document for this course across all instances of its offerings.
Course Title
Information Retrieval
Course No(s)
SS ZG537
Credit Units
4
Credit Model
Unit split between Class Hours + Lab/Design/Fieldwork + Student preparation
Ex. 1-1-2, (total 4 units or credits) ie 1 unit for class room hours, 1 unit for lab hours, 2 units for student preparation. Typically 1 unit translates to 32 hours
Course Author
Rajendra Kumar Roul
Version No
V1.0
Date


Course Objectives
No
Course Objective
CO1
To understand structure and organization of various components of an IR system
CO2
To understand information representation models, term scoring mechanisms, etc. in the complete search system
CO3
To understand architecture of search engines, crawlers and the web search
CO4
To understand cross lingual retrieval and multimedia information retrieval
CO5
To understand the concepts of Recommender Systems.

Text Book(s)
T1
C. D. Manning, P. Raghavan and H. Schutze. Introduction to Information Retrieval, Cambridge University Press, 2008.  http://nlp.stanford.edu/IR-book/http://nlp.stanford.edu/IR-book/

Reference Book(s) & other resources
R1
Modern Information Retrieval, Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Addison-Wesley, 2000. http://people.ischool.berkeley.edu/~hearst/irbook/http://people.ischool.berkeley.edu/~hearst/irbook/
R2
Ricci, F.; Rokach, L.; Shapira, B.; Kantor, P.B. (Eds.), Recommender Systems Handbook. 1st Edition., 2011, 845 p. 20 illus., Hardcover, ISBN: 978-0-387-85819-7
R3
Cross-Language Information Retrieval by By Jian-Yun Nie Morgan & Claypool Publisher series 2010
R4
 Multimedia Information Retrieval by Stefan M. Rüger Morgan & Claypool Publisher series 2010.
R5
Information Retrieval: Implementing and Evaluating Search Engines by S. Buttcher, C. Clarke and G. Cormack, MIT Press, 2010.
R6
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data by B. Liu, Springer, Second Edition, 2011.
R7
Search Engines: Information Retrieval in Practice by Bruce Croft, Donald Metzler, and Trevor Strohman, Addison-Wesley, 2009.
R8
Koehn P., “Statistical Machine Translation”, Cambridge University Press, 2010.


Modular Content Structure
      1.            Introduction
                        1.1.            Information Retrieval
                        1.2.            Basic Search Model
      2.            Basic Information Retrieval Concepts
                        2.1.            Boolean Retrieval
                        2.2.            Dictionaries and Tolerant Retrieval
                        2.3.            Index Construction and Compression
      3.            Vector Space Model
                        3.1.            Scoring, Term Weighting
                        3.2.            The Vector Space Model for Scoring
      4.            Text Mining
                        4.1.            Text Classification
                        4.2.            Vector Space Classification
                        4.3.            Text Clustering
      5.            Cross Lingual Retrieval
                        5.1.            Language Problems in IR
                        5.2.            Translation Approaches for CLIR
      6.            Recommender Systems
                        6.1.            Collaborative recommendation
                        6.2.            Content based recommendation
      7.            Multimedia Information Retrieval
                        7.1.            Multimedia search technologies
                        7.2.            Content based retrieval
                        7.3.            Image and Audio data challenges
                        7.4.            Multimedia IR Research
      8.            Web Search
                        8.1.            Web Search Basics
                        8.2.            Web Crawlers and Indexes
                        8.3.            Link Analysis


Learning Outcomes:
No
Learning Outcomes
LO1
Students will gain understanding about an information retrieval system as a whole and about its components.
LO2
Students will have knowledge about the design issues and their solutions of different type of models including Boolean, vector space etc.
LO3
Students will have detailed understanding about text indexing, mining, weighting schemes etc.
LO4
Students will acquire knowledge about cross lingual and multimedia information retrieval.
LO5
Differentiate between different recommender systems and suggest a suitable system based on the problem and data available.
LO6
With the acquired knowledge students will be able to design and build different kind of information retrieval systems.

Experiential learning components
Additional documentation
Part B: Course Handout
Academic Term
First  Semester 2017-2018
Course Title
Information Retrieval
Course No
SS ZG537
Lead Instructor
Maheshwari K


Contact Hour
List of Topic Title
(from content structure in Part A)
Topic #
(from content structure in Part A)
Text/Ref Book/external resource
1-2
     Introduction
o      Information Vs Data Retrieval
o      Basic Concepts
o      The retrieval process
o      Taxonomy of IR
o      Classic IR and Alternative models
1.1, 1.2
R1 Ch1, Ch2
3-5
     Boolean Retrieval
o      Inverted index
o      Processing Boolean queries
o      Term vocabulary and postings lists
o      Phrase queries
o      Positional indexes
2.1
T Ch2, R1 Ch2 section 5
6-7
     Dictionary and Tolerant Retrieval
o      Search Structures for dictionaries
o      Wildcard queries
o      Spelling correction
o      Edit distances
o      Phonetic Correction
2.2
T Ch3
8-10
     Index Construction and Compression
o      Blocked sort-based Indexing
o      Single pass in-memory indexing
o      Distributed and dynamic indexing
o      Dictionary comparison
o      Postings file compression
o      Weighted zone scoring
2.3
T Ch4
11-12
     Vector Space Model
o      Term frequency and weighting
o      The vector space model for scoring
o      Tf-idf functions
o      Dot products
o      Queries as vectors
o      Variant tf-idf functions
o      Document and query weighting schemes
3.1, 3.2
T Ch6
13-15
     Text Mining
o      Classification
     Naïve Bayes
     Vector space classification
     Evaluating Classification
o      Clustering
     Flat clustering
     Hierarchical clustering
4.1, 4.2, 4.3
T Ch13, 14, 16, 17
16-19
     Cross Lingual IR (CLIR)
o      Language problems in IR
o      Translation Approaches
o      Handling Many Languages
o      Resources for CLIR
5.1, 5.2
R3 Ch2,
R8 Ch4, 5, 6
20-25
     Recommender System
o      Collaborative recommendation
o      Content based recommendation
o      Other type & hybrid  recommendations
6.1, 6.2
R2 Ch1-5
26-29
     Multimedia IR
o      Basic Multimedia search technologies
o      Content Based Retrieval
o      Image and Audio data challenges
o      Multimedia IR Research
7.1,7.2, 7.3, 7.4
R4 Ch1, 2, 3
30-31
     Web Search
o      Web characteristics
o      The search user experience
o      Index size and estimation
     Web Crawling and Indexes
o      Crawling
o      Crawler Architecture
o      Distributed Indexes
     Link Analysis
o      The web as a graph
o      Google’s page rank
o      Hub and Authorities (HITS)
8.1, 8.2, 8.3
T Ch 19, 20, 21
32
     Review





Detailed Plan for Lab work/Design work
Lab No
Lab Objective
Lab Sheet Access URL
Content Reference
1



2



3



4



5



6





Case studies: Detailed Plan
Case study No
Case study Objective
Case study Sheet Access URL
1


2



Work integration: Detailed plan
No
Activity description

(Examples are given below)
1
Apply Domain modelling concept to the work you are doing in the work place
2
Present the architecture of the software you are working on
3
Analyse the test plan of the software project you are working on and identify areas where it can be further improved
4
Seminar / talk by Project manager in the company on a topic of relevance to the course

Project work: Detailed Plan
1.      Objective of the project:
2.      Project scenario description:
3.      Tasks to be performed by the students:
4.      Expected deliverables:
5.      Duration of the project:

 Evaluation Scheme:  
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
Name
Type
Duration
Weight
Day, Date, Session, Time
EC-1
Quiz-I
Online
-
5%
August 26 to September 4, 2017

Quiz-II
Online
-
5%
September 26 to October 4, 2017

Lab
Online
-
10%
October 20 to 30, 2017
EC-2
Mid-Semester Test
Closed Book
2 hours
30%
23/09/2017 (FN) 10 AM – 12 Noon
EC-3
Comprehensive Exam
Open Book
3 hours
50%
04/11/2017 (FN) 9 AM – 12 Noon


Syllabus for Mid-Semester Test (Closed Book): Topics in Contact Hours: 1 to 16
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 32)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1.      EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the course pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2.      For Closed Book tests: No books or reference material of any kind will be permitted.
3.      For Open Book exams: Use of books and any printed / written reference material (filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4.      If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as given in the course handout, attend the online lectures, and take all the prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the handout.


No comments:

Post a Comment