Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
First Semester 2016-2017
Mid-Semester
Test
(EC-2
Regular)
Course No. : IS ZC415
Course Title : DATA MINING
Nature of Exam : Closed Book
Weightage : 30%
Duration : 2 Hours
Date
of Exam : 24/09/2016 (AN)
No.
of page: 2
No.
of questions: 6
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q1. Given the following set of numbers that represents the percentages
achieved by ten students.
63, 81, 64, 70,
73, 64, 77, 76, 81, 42
a) What is the IQR of above data? [2]
b) Draw boxplot from the above data. Mention any outlier if
present. [3]
Q2. Given the following marks scored by a student in two
subjects, compute z-scores to find out in which subject the student has done
better comparatively.
|
Mark obtained by the student
|
Mean mark of the class
|
Standard deviation of marks of the class
|
Subject 1
|
70
|
60
|
15
|
Subject 2
|
65
|
60
|
6
|
[3]
Q3. Consider the set of
data below:
5, 10, 11, 13, 15,
35, 50, 55, 72, 150, 204, 215.
a) Partition it into two bins using equal-width partioning. [2]
b) Perform smoothing by bin boundary. [2]
Q4. Consider the training examples shown in table
below for a binary classification problem
a) What is the
entropy of this collection of training examples? [2]
b) What are
the information gains(entropy based) of splitting by a1 and a2 relative to
these training examples? Compute separately. [4]
Q5. Consider the training examples shown in table below for a binary
classification problem.
a)
Compute the Gini index for the overall collection of training examples. [2]
b)
Compute the Gini index for the Gender attribute. [2]
c)
Compute the Gini index for the Car Type attribute using multiway split. [2]
d)
From your answers b. and c., which attribute is better, Gender or Car Type?
[1]
e)
Explain why Customer ID should not be used as the attribute test condition even
though it has the lowest Gini. [1]
Q6. In the following table, the
third column is the predicted probability (posterior) for the positive class in
a binary classification problem.
Assume
that any test instances whose posterior probability is greater than
Threshold=0.5 will be classified as a positive example. Compute the Precision,
Recall, and F-measure for the model at this threshold value. [4]
********
No comments:
Post a Comment