BITS WILP Advanced Data Mining Mid-Sem Exam (Regular) 2017-H2




Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division
First Semester 2017-2018

Mid-Semester Test
(EC-2 Regular)

Course No.                  : SS ZG548  
Course Title                 : ADVANCED DATA MINING  
Nature of Exam           : Closed Book
Weightage                    : 35%
Duration                      : 2 Hours  
Date of Exam              : 24/09/2017     (FN)
No of pages: 2
No of questions: 4
Note:
1.        Please follow all the Instructions to Candidates given on the cover page of the answer book.
2.        All parts of a question should be answered consecutively. Each answer should start from a fresh page.  
3.        Assumptions made if any, should be stated clearly at the beginning of your answer.

Q1. Explain the following concepts with respect to data mining.                               [2 + 2 + 2 = 6]
a. Difference between error and noise in data
b. Predictive and Descriptive data mining tasks with example
c. Overfitting and regularization.

Q2. Describe incremental association mining setting. How fast update algorithm (FUP) differs from fast update 2 algorithm (FUP2). Explain how one can maintain certain data structures required by FUP2 in an incremental way.                                                 [1 + 1 + 2 = 4]

Q3. Consider association rule mining for incremental databases. What are the main advantages of using Compact Pattern Stream tree (CPS Tree) over FP-Tree. Assuming a pan contains two transactions (that is restructuring happens after arrival of two transactions). And each window to contain two pans. Consider step by step arrival of following six transactions
1. A, B, C, D, E
2. B, C, E, F
3. E, C, D, F
4. B, F, C, A
5. D, E, C
6. F, B, D
Starting from a null CPS tree, draw all intermediate CPS tree on incremental arrival of every transaction. Clearly explain modifications to the CPS tree in each step.   [2 + 8 = 10]

Q4. Consider the problem of clustering an evolving database. Assume database updates happens at a regular interval. Let initially there be 15 datum points P1, P2, …, P15 spread in 2D space. Distances between every pair of datum point is provided in Table-1. A Density-Based Spatial Clustering algorithm DBSCAN is applied with parameters Eps=31 and MinPts=3.                                                                         [3 + 1 + 3 + 3 + 1 + 4 = 15]
i) Identify Core, border and noise points
ii) How many clusters are there?  
iii) Determine the cluster membership for each of the core or border point  
Consider the arrival of five more data points P16, P17, P18, P19, P20 in the database. Now we have 20 datum points.   Refer Table-1 to get the distance between any two datum point of the database. Apply incremental DBSCAN on updated database to report the following.

iv) Identify Core, border and noise points  
v) How many clusters are there?  
vi) Determine the cluster membership for each of the core or border point  

Table 1.: Distance between every pair of datum points is given below

**********

No comments:

Post a Comment