BITS WILP Data Mining Mid-Sem Exam 2016-H2


Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division
First  Semester 2016-2017

Mid-Semester Test
(EC-2 Regular)

Course No.                  : IS ZC415  
Course Title                 : DATA MINING 
Nature of Exam           : Closed Book
Weightage                    : 30%
Duration                      : 2 Hours 
Date of Exam              : 24/09/2016 (AN)  
No. of page: 2
No. of questions: 6
Note:
1.       Please follow all the Instructions to Candidates given on the cover page of the answer book.
2.       All parts of a question should be answered consecutively. Each answer should start from a fresh page. 
3.       Assumptions made if any, should be stated clearly at the beginning of your answer.

Q1. Given the following set of numbers that represents the percentages achieved by ten students.
63, 81, 64, 70, 73, 64, 77, 76, 81, 42
a) What is the IQR of above data?                                                                        [2]
b) Draw boxplot from the above data. Mention any outlier if present.                             [3]

Q2. Given the following marks scored by a student in two subjects, compute z-scores to find out in which subject the student has done better comparatively.

Mark obtained by the student
Mean mark of the class
Standard deviation of marks of the class
Subject 1
70
60
15
Subject 2
65
60
6
                                                                                                                                                       [3]
Q3. Consider the set of data below:
     5, 10, 11, 13, 15, 35, 50, 55, 72, 150, 204, 215.
a) Partition it into two bins using equal-width partioning.                                    [2]
b) Perform smoothing by bin boundary.                                                                            [2]

Q4. Consider the training examples shown in table below for a binary classification problem
a) What is the entropy of this collection of training examples?                                         [2]
b) What are the information gains(entropy based) of splitting by a1 and a2 relative to these training examples? Compute separately.                                                          [4]


Q5. Consider the training examples shown in table below for a binary classification problem.

a) Compute the Gini index for the overall collection of training examples.                     [2]
b) Compute the Gini index for the Gender attribute.                                                        [2]
c) Compute the Gini index for the Car Type attribute using multiway split.                    [2]
d) From your answers b. and c., which attribute is better, Gender or Car Type?              [1]
e) Explain why Customer ID should not be used as the attribute test condition even though it has the lowest Gini.                                                                                           [1]

Q6. In the following table, the third column is the predicted probability (posterior) for the positive class in a binary classification problem.
Assume that any test instances whose posterior probability is greater than Threshold=0.5 will be classified as a positive example. Compute the Precision, Recall, and F-measure for the model at this threshold value.                                                                                  [4]



********

No comments:

Post a Comment