BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE,
PILANI
WORK INTEGRATED LEARNING PROGRAMMES
Digital Learning
Part A: Course Design
Course Title
|
Data Mining
|
Course No(s)
|
IS ZC415
|
Credit Units
|
3
|
Credit Model
|
|
Content Authors
|
Surender Singh Samant
|
Course Objectives
No
|
|
CO1
|
To introduce basic concepts of
data mining.
|
CO2
|
To familiarize students with
practical technologies in data mining.
|
CO3
|
To provide students interesting
problems in the field of data mining to solve.
|
Text Book(s)
T1
|
Tan P. N., Steinbach M & Kumar
V. “Introduction to Data Mining” Pearson Education, 2006
|
T2
|
Data Mining: Concepts and
Techniques, Third Edition by Jiawei Han, Micheline Kamber and Jian Pei
Morgan Kaufmann Publishers
|
Reference Book(s) & other resources
R1
|
Predictive Analytics and Data
Mining: Concepts and Practice with RapidMiner by Vijay Kotu and Bala Deshpande Morgan
Kaufmann Publishers © 2015
|
R2
|
http://www.scikit-learn.org
|
Content
Structure
Modules
No.
|
Title of the Module
|
M1
|
Introduction to Data Mining
|
M2
|
Data Preprocessing:
To understand the
need for data preprocessing and various techniques used in the context of
Data Mining
|
M3
|
Data Exploration:
A preliminary
exploration of the data to better understand its characteristics
|
M4
|
Classification and
prediction:
To learn different
techniques and algorithms for classification, a major predictive and
supervised Data Mining task
|
M5
|
Association Analysis:
To understand the
descriptive relation between the entities by identifying associations among
them and to learn various algorithms to find them
|
M6
|
Clustering:
To learn different
techniques and algorithms for clustering, a major descriptive and
unsupervised Data Mining task
|
M7
|
Anomaly Detection:
Detecting outliers
and noise in data sets is an important Data Mining task. This module focuses
on techniques needed for anomaly detection
|
M8
|
Data Mining on
unstructured(Big) data:
Graph Mining, Social
Network Analysis, Multimedia Data Mining, Text Mining, Mining the World Wide
Web
|
M9
|
Data Mining
Applications:
Recommendation
Systems
Fraud Detection
Sentiment Analysis
|
Glossary of Terms:
1.
Contact Hour (CH)
stands for a hour long live session with students conducted either in a
physical classroom or enabled through technology. In this model of instruction,
instructor led sessions will be for 20 CH.
a.
Pre CH = Self Learning
done prior to a given contact hour
b.
During CH = Content to
be discussed during the contact hour by the course instructor
c.
Post CH = Self
Learning done post the contact hour
2.
RL stands for Recorded
Lecture or Recorded Lesson. It is presented to the student through an online
portal. A given RL unfolds as a sequences of video segments interleaved with
exercises
3.
SS stands for Self-Study to be done as a study
of relevant sections from textbooks and reference books. It could also include
study of external resources.
4.
LE stands for Lab
Exercises
5.
HW stands for Home
Work will consist of discussed/new problems; could be a selection of problems
from the text.
M1: Introduction to Data Mining
Type
|
Description/Plan/Reference
|
RL1.1
|
RL1.1.1 = Definition
of Data Mining?
RL1.1.2 = What
type of data can be mined?
|
RL1.2
|
RL1.2.1 = What kind
of patterns can be mined?
RL1.2.2 = What kind
of applications are targeted?
|
RL1.3
|
DM Process (R1) &
DM Challenges (T2)
RL1.3.1 =
Process/Technologies used in DM.
RL1.3.2 = Challenges
in DM.
|
CS1.1
|
CS1.1.1 = Review of
Data Mining basics Examples of patterns that can be mined
CS1.1.2 = Examples of
technologies used in DM Approaches to overcome challenges. Discuss one
example Case Study for data mining
|
LE1.1
|
Exploration of Weka,
operations, features, arff files.
|
SS1.1
|
T1, Chapter 1; T2, Ch
1
|
HW1.1
|
Exercises at the end
of T2, Ch 1
|
QZ1.1
|
|
M2: Data Preprocessing
Type
|
Description/Plan/Reference
|
RL2.1
|
RL2.1.1 = Why does
data need preprocessing?
RL2.1.2 = Major tasks
in data preprocessing
|
RL2.2
|
RL2.2.1 = Data
Cleaning techniques
RL2.2.2 = Data
discretization, transformation, integration,
reduction
|
CS2.1
|
CS2.1.1 = Review of
concepts of data preprocessing
CS2.1.2 = Examples of
application of preprocessing techniques.
|
LE2.1
|
Experiments with Weka
- filters, discretization
|
SS2.1
|
|
HW2.1
|
|
QZ2.1
|
|
M3: Data Exploration
Type
|
Description/Plan/Reference
|
RL3.1
|
RL3.1.1 = Various
types of data to be mined
RL3.1.2 = Statistical
descriptions of data
|
RL3.2
|
RL3.2.1 = Measuring
data similarity & dissimilarity
RL3.2.2 = Data Visualization
|
CS3.1
|
CS3.1.1 = Review of
concepts of data exploration
CS3.1.2 = Examples of
similarities & dissimilarities.
|
LE3.1
|
|
SS3.1
|
|
HW3.1
|
|
QZ3.1
|
|
M4: Classification and Prediction
Type
|
Description/Plan/Reference
|
RL4.1
|
RL4.1.1 =
Introduction to classification and prediction
RL4.1.2 = Decision
trees for classification
RL4.1.3 = Rule based
classification, Bayesian classification, Support vector machines
|
RL4.2
|
RL4.2.1 = Issues
regarding classification and prediction,
RL4.2.2 = Linear
Regression, Nonlinear Regression
|
CS4.1
|
CS4.1.1 = Review of
concepts of recorded lectures, Algorithm for Decision trees induction,
Classification by back propagation, Comparison of methods of classification
CS4.1.2 = Prediction:
Other Regression-Based Methods.
|
LE4.1
|
Experiments with Weka
- decision trees, rules, prediction
|
SS4.1
|
|
HW4.1
|
|
QZ4.1
|
|
M5: Association Analysis
Type
|
Description/Plan/Reference
|
RL5.1
|
RL5.1.1 = What
is association rule mining?
RL5.1.2 = Frequent
Itemsets, Closed Itemsets, and Association Rules
|
RL5.2
|
RL5.2.1 = What
is Apriori Algorithm?
RL5.2.2 = Finding
Frequent Itemsets Using Candidate Generation, Generating Association Rules
from Frequent Itemsets
|
CS5.1
|
CS5.1.1 = Review
of concepts of recorded lectures , Improving the Efficiency of Apriori
CS5.1.2 = Mining
Frequent Itemsets without Candidate Generation.
|
LE5.1
|
Experiments with Weka - mining association rules
|
SS5.1
|
|
HW5.1
|
|
QZ5.1
|
|
M6: Clustering
Type
|
Description/Plan/Reference
|
RL6.1
|
RL6.1.1 = What
is cluster analysis? Types of data in Cluster analysis.
RL6.1.2 = Partitioning
methods: k-means
|
RL6.2
|
RL6.2.1 = Hierarchical
algorithms
RL6.2.2 = Introduction
to density based approach
|
CS6.1
|
CS6.1.1 = Review
of concepts of recorded lectures
CS6.1.2 = Density
based algorithm: DBSCAN
|
LE6.1
|
Experiments with Weka - k-means
|
SS6.1
|
|
HW6.1
|
|
QZ6.1
|
|
M7: Anomaly Detection
Type
|
Description/Plan/Reference
|
RL7.1
|
RL7.1.1 = Preliminaries
RL7.1.2 = Statistical
approach
|
RL7.2
|
RL7.2.1 = Proximity
based outlier detection
RL7.2.2 = Density
based outlier detection
|
CS7.1
|
CS7.1.1 = Review
of concepts of recorded lectures
CS7.1.2 = Clustering
based techniques
|
LE7.1
|
|
SS7.1
|
|
HW7.1
|
|
QZ7.1
|
|
M8: Data mining on unstructured
(Big) data
Type
|
Description/Plan/Reference
|
RL8.1
|
RL8.1.1 = Graph
Mining methods and applications- Graph Indexing, Similarity Search,
Classification, and Clustering
RL8.1.2 = Multimedia
Data Mining- Classification and Prediction Analysis of Multimedia Data,
Mining Associations in Multimedia Data, Audio
and Video Data Mining
|
RL8.2
|
RL8.2.1 = Text
Mining - Text Data Analysis and Information Retrieval
RL8.2.2 = Dimensionality
Reduction for Text, Text Mining Approaches
|
CS8.1
|
CS8.1.1 = Social
Network Analysis
CS8.1.2 = Mining
the World Wide Web
|
LE8.1
|
|
SS8.1
|
|
HW8.1
|
|
QZ8.1
|
|
M9: Data Mining Applications
Type
|
Description/Plan/Reference
|
RL9.1
|
RL9.1.1 = Recommendation
systems
RL9.1.2 = Case
study for Recommendation systems
|
RL9.2
|
RL9.2.1 = Fraud
Detection
RL9.2.2 = Case
study for Fraud Detection
|
CS9.1
|
CS9.1.1 = Sentiment
Analysis
CS9.1.2 = Case
study for Sentiment Analysis
|
LE9.1
|
|
SS9.1
|
|
HW9.1
|
|
QZ9.1
|
|
Part B: Contact Session Plan
Academic Term
|
First
Semester 2017-2018
|
Course Title
|
Data
Mining
|
Course No
|
IS
ZC415
|
Content Developer
|
Surender
Singh Samanth
|
Contact hour
|
Pre-contact hour prep
|
During Contact hour
|
Post-contact hour
|
1
|
RL 1.1, RL 1.2
|
CS
1.1
|
|
2
|
RL 1.3
|
CS1.2
|
LE1.1, HW1.1 ,SS1.1
|
3
|
RL2.1
|
CS2.1
|
|
4
|
RL2.2
|
CS2.2
|
LE2.1, SS2.1, HW2.1
|
5
|
RL3.1
|
CS3.1
|
|
6
|
RL3.2, RL3.3
|
CS3.2
|
LE3.1, SS3.1, HW3.1
|
7
|
RL4.1, RL4.2, RL4.3
|
CS4.1
|
|
8
|
RL4.4
|
CS4.2
|
LE4.1, SS4.1, HW4.1
|
9
|
RL5.1, RL5.2
|
CS5.1
|
|
10
|
|
Review
|
|
11
|
|
Review
|
|
12
|
RL5.3, RL5.4
|
CS5.2
|
LE5.1, SS5.1, HW5.1
|
13
|
RL6.1
|
CS6.1
|
|
14
|
RL6.2, RL6.3, RL6.4
|
CS6.2
|
LE6.1, SS6.1, HW6.1
|
15
|
RL7.1
|
CS7.1
|
|
16
|
RL7.2, RL7.3
|
CS7.2
|
LE7.1, SS7.1, HW7.1
|
17
|
RL8.1, RL8.2, RL8.3
|
CS8.1, CS8.2
|
|
18
|
RL9.1, RL9.2, RL 9.3
|
CS9.1
|
SS8.1,
SS9.1, HW8.1
|
19
|
Python basics, scikit-learn
|
Class notes/case study
|
|
20
|
Earlier case study/python basics
|
Class notes/case study
|
|
21
|
|
Review
|
|
22
|
|
Review
|
|
Notes:
Evaluation Scheme:
Legend: EC =
Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
|
Name
|
Type
|
Duration
|
Weight
|
Day, Date, Session, Time
|
EC-1
|
Quiz-I/ Assignment-I
|
Online
|
-
|
5%
|
August 26 to
September 4, 2017
|
|
Quiz-II
|
Online
|
|
5%
|
September 26 to
October 4, 2017
|
|
Lab
|
Online
|
|
10%
|
October 20 to 30,
2017
|
EC-2
|
Mid-Semester Test
|
Closed Book
|
2 hours
|
30%
|
23/09/2017 (AN) 2 PM TO 4 PM
|
EC-3
|
Comprehensive Exam
|
Open Book
|
3 hours
|
50%
|
04/11/2017 (AN) 2 PM TO 5 PM
|
Syllabus for
Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 11
Syllabus for
Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected
to visit the Elearn portal on a regular basis and stay up to date with the
latest announcements and deadlines.
Contact
sessions: Students should
attend the online lectures as per the schedule provided on the Elearn portal.
Evaluation Guidelines:
1.
EC-1 consists of
either two Assignments or three Quizzes. Students will attempt them through the
course pages on the Elearn portal. Announcements will be made on the portal, in
a timely manner.
2.
For Closed Book tests:
No books or reference material of any kind will be permitted.
3.
For Open Book exams:
Use of books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of
calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
4.
If a student is unable
to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam which will be
made available on the Elearn portal. The Make-Up Test/Exam will be conducted
only at selected exam centres on the dates to be announced later.
It shall be the
responsibility of the individual student to be regular in maintaining the self
study schedule as given in the course handout, attend the online lectures, and
take all the prescribed evaluation components such as Assignment/Quiz,
Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.
No comments:
Post a Comment