BITS WILP Advanced Data Mining Handout 2018-H1



BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES

 

 

 

Part A: Content Design

Course Title
Advanced Data Mining
Course No(s)
SS ZG548
Credit Units
4
Credit Model
 
Content Authors
Dr. Kamlesh Tiwari

 

 

 

Course Objectives

No
 
CO1
To learn how to mine complex data (beyond conventional record data) and complex structures such as Tree/graph, sequence data, web/text data, stream data, mining multivariate time series data, high-dimensional data etc.
CO2
To learn how to apply these techniques to specific applications such as web search, Information Retrieval, social networks etc.
CO3
To learn about distributed computing solutions for data intensive applications in data mining

 

 

 

Text Book(s)

T1
 
T2
 

 

 

Reference Book(s) & Other resources

R1
Tan P. N., Steinbach M & Kumar V. “Introduction to Data Mining” Pearson Education, 2006
R2
Yates R. B. and Neto B. R. “Modern Information Retrieval” Pearson Education, 2005
R3
Han J. & Kamber M., “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Second Edition, 2006
R4
Christopher D.M., Prabhakar R. & Hinrich S. “Introduction to Information Retrieval” Cambridge UP Online edition, 2009
R5
Hadzic F., Tan H. & Dillon T. S. “Mining data with Complex Structures” Springer, 20
R6
Agarwal Charu C. (Ed) “Data Streams Models and Algorithms” Springer 2007
R7
R8
R9
Azure Cosmos DB https://docs.microsoft.com/en-us/azure/cosmos-db/introduction

 

 

 

 

 

Content Structure

 

 

      1.            Introduction

                        1.1.            Review of data mining

                        1.2.            Objectives

                        1.3.            Overview

 

      2.            Incremental & Stream Data Mining

                        2.1.            Incremental Algorithms for Data Mining

                        2.2.            Characteristics of Streaming Data

                        2.3.            Issues and Challenges

                        2.4.            Streaming Data Mining Algorithms

                        2.5.            Executing Streaming Data Mining on HDInsight/Data Science VM

 

      3.            Distributed computing solutions for data mining

                        3.1.            MapReduce/Hadoop

                        3.2.            Spark

                        3.3.            Setting up Hadoop and Spark cluster on Azure HDInsight to perform Data mining tasks

 

      4.            Sequence Mining

                        4.1.            Characteristics of Sequence Data

                        4.2.            Problem Modeling

                        4.3.            Sequence Pattern Discovery

                        4.4.            Timing Constraints

                        4.5.            Performing Data Mining Algorithm on Azure Data Science VM

 

      5.            Text Mining

                        5.1.            Text Classification

                        5.2.            Vector Space Model

                        5.3.            Flat and Hierarchical Clustering

                        5.4.            Streaming Data Mining Algorithms

                        5.5.            Text Classification on HDInsight

 

      6.            Web Search

                        6.1.            Crawling & Indexing

                        6.2.            Hyperlink analysis

                        6.3.            HITS and Page Rank Algorithms

              6.3     Building Web Search using built-in library

 

      7.            Mining Complex Structures

                        7.1.            Mining Trees

                                    7.1.1.                  Tree Miner

                                    7.1.2.                  Tree Model Guided Framework

                                    7.1.3.                  TMG framework for mining ordered & unordered subtrees

                        7.2.            Mining Graphs

                                    7.2.1.                  Approaches to graph mining

                                    7.2.2.                  Building and Traverse Graph Database using Azure Cosmos DB

                        7.3.            Case Study: Information Retrieval

                        7.4.            Case Study: Mining Social Networks

 

 

 

 

Learning Outcomes:

 

No
Learning Outcomes
LO1
To understand how to update the patterns incrementally when the data is continuously coming
LO2
To understand the role of distributed computing in data intensive data mining
LO3
To study how to investigate the sequence data
LO4
To understand how text mining is different from data mining and how to mine it
LO5
To understand what goes into the web search and to study methods of web search and their improvements
LO6
To understand how to mine complex structures other than records while retaining the relations among the entities
LO7
Familiarity with tool used for Data mining and advance analytics on azure

 


 

Part B: Learning Plan

 

Academic Term
Second   Semester 2017-2018
Course Title
Advanced Data Mining
Course No
SS ZG548
Lead Instructor
Dr. Kamlesh Tiwari

 

Contact Hour 1

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
 
Introduction
    Review and Overview
 
 
During CH
Post CH

 

Contact Hour 2

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
 
Incremental Data Mining
Relook traditional algorithms
See Class Slides
During CH
Post CH

 

Contact Hour 3

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
 
Incremental algorithms and their design and analysis
See Class Slides
During CH
Post CH

 

Contact Hour 4

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
 
Incremental algorithms and their design and analysis
See Class Slides
During CH
Post CH

 

Contact Hour 5

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
 
Incremental algorithms and their design and analysis
See Class Slides
During CH
Post CH

 

Contact Hour 6

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Stream Data Mining Characteristics, Issues and Challenges
R6 Ch1,4
During CH
Post CH

 

Contact Hour 7

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Stream Data Mining Algorithms and their Comparison
R6 Ch1, 4
During CH
Post CH

 

Contact Hour 8

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Stream Data Mining Algorithms and their Comparison
R6 Ch1, 4
During CH
Post CH

 

Contact Hour 9

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Distributed computing solutions for data mining
See Class Slides
During CH
Post CH

 

Contact Hour 10

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Distributed computing solutions for data mining
See Class Slides
During CH
Post CH

 

Contact Hour 11

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Distributed computing solutions for data mining
See Class Slides
During CH
Post CH

 

Contact Hour 12

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class Slides
Distributed computing solutions for data mining
See Class Slides
During CH
Post CH

 

Contact Hour 13

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R1 7.4
 
Sequence Mining
Characteristics and Problem Modeling
R1 7.4
During CH
Post CH

 

Contact Hour 14

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R1 7.4
Sequence Pattern Discovery
Timing Constraints
R1 7.4
During CH
Post CH

 

Contact Hour 15

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 1, 13
Text Mining
Data Representation and    Characteristics
R4 Ch 1, 13, R2 Ch 7
During CH
Post CH

 

Contact Hour 16

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 14
Text Classification
Feature Selection & Models
R4 Ch 14, R2 Ch 7
During CH
Post CH

 

Contact Hour 17

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 14
Text Classification
Vector Space Model
R4 Ch 14, R2 Ch 7
During CH
Post CH

 

Contact Hour 18

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 13, 14
Text Classification
Multiclass classifiers for text
R4 Ch 13,14
During CH
Post CH

 

Contact Hour 19

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 16, 17
Text Clustering
Flat and hierarchical
R4 Ch 16,17
During CH
Post CH

 

Contact Hour 20

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 1, 6, 19
Web Search
 
R4 Ch 1, 6, 19
During CH
Post CH

 

Contact Hour 21

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 20
Crawling & Indexing
 
R4 Ch 20
During CH
Post CH

 

Contact Hour 22

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 20
Crawling & Indexing
 
R4 Ch 20
During CH
Post CH

 

Contact Hour 23

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 20
Crawling & Indexing
 
R4 Ch 20
During CH
Post CH

 

Contact Hour 24

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R4 Ch 21
See Class slides
Link Analysis
 
R4 Ch 21
During CH
Post CH


 

Contact Hour 25

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R5 Ch1
See Class slides
Mining Complex Structures
Data Representation
R5 Ch1
During CH
Post CH

 

Contact Hour 26

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R5 Ch 2, 3
See Class slides
Tree Mining problem and Tree basics
 
R5 Ch 2, 3
During CH
Post CH

 

Contact Hour 27

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R5 Ch 3
See Class slides
Tree Miner
 
R5 Ch 3
During CH
Post CH

 

Contact Hour 28

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R5 Ch 4, 5, 6
 
TMG Model Guided Framework
 
R5 Ch 4, 5, 6
During CH
Post CH

 

Contact Hour 29

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
R5 Ch 11
See Class slides
Graph Mining
Introduction and applications
 
R5 Ch 11
During CH
Post CH

 

Contact Hour 30

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class slides
Case Study: Information Retrieval
 
 
During CH
Post CH

 

Contact Hour 31

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class slides
Case Study: Social Network Mining
 
 
During CH
Post CH

 

Contact Hour 32

Type
Content Ref.
Topic Title
Study/HW Resource Reference
Pre CH
See Class slides
Case Study: Social Network Mining
 
 
During CH
Post CH

 


 

 

 Evaluation Scheme:  

 

Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session

 

No
Name
Type
Duration
Weight
Day, Date, Session, Time
EC-1
Quiz-I/ Assignment-I
Online
-
5%
February 1 to 10, 2018
 
Quiz-II
 
 
5%
March 1 to 10, 2018
 
Quiz-III/ Assignment-II
 
 
5%
March 20 to 30, 2018
EC-2
Mid-Semester Test
Closed Book
2 hours
35%
04/03/2018 (FN) 10 AM – 12 Noon
EC-3
Comprehensive Exam
Open Book
3 hours
50%
22/04/2018 (FN) 9 AM – 12 Noon

 

 

Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 16 

Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 32)

 

Important links and information:

Elearn portal: https://elearn.bits-pilani.ac.in

Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest announcements and deadlines.

Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn portal.

Evaluation Guidelines:

1.      EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the course pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.

2.      For Closed Book tests: No books or reference material of any kind will be permitted.

3.      For Open Book exams: Use of books and any printed / written reference material (filed or bound) is permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.

4.      If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student should follow the procedure to apply for the Make-Up Test/Exam which will be made available on the Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the dates to be announced later.

It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as given in the course handout, attend the online lectures, and take all the prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme provided in the handout.

 

 

No comments:

Post a Comment