Birla Institute of Technology &
Science, Pilani
Work Integrated Learning Programmes
Division
First Semester 2017-18
Mid Semester Test (EC2 Regular)
Course No: IS ZC415
Course Title: Data Mining
Nature of Exam: Closed Book
Weightage: 30%
Duration: 2 Hours
Date of Exam: 23/Sep/2017 (AN)
No of pages: 2
No of questions: 4
Page: 1
Page: 2
Solutions:
Answer 1(A):
Binning Methods for Data Smoothing
Sorted data for price (in dollars): 4, 8,
9, 15, 21, 21, 24, 25, 26, 28, 29, 34
*
Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
*
Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
*
Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
Anwer 1(B)
Euclidean distance is widely used in the
Geometry where shortest distance between two points is often required to
calculated as in distances between two celestial objects in space.
Manhattan distance is used in the
navigation systems to calculate the distance between two points through the
obstacle that are there in the path. This is also known as ‘taxi cab’ distance.
Cosine distance is used in ‘web search,
information retrieval’ where two documents are represented as vectors with
terms as dimensions and similarity between two documents is calculated in the
form of cosine distance between them.
Answer 1(C):
...
A confusion matrix is a table that
is often used to describe the performance of a classification model (or
"classifier") on a set of test data for which the true values are
known.
...
…
In z-score normalization (or zero-mean normalization),
the values for an attribute, A, are normalized based on the mean and
standard deviation of A. A value, v, of A is normalized to
v’ by computing:
Marks
|
Z-score
|
|
Subject 1
|
70
|
(70 – 60)/15 = 0.666
|
Subject 2
|
65
|
(65 - 60)/6 = 0.833
|
Student did better subject 2.
Answer 2:
Use this:
…
x-mean = 1.5
y-mean = 3.5
W1 = ((1-1.5)*(2 – 3.5) + (2-1.5)*(5-3.5)) / ((1-1.5)^2 +
(2-1.5)^2) = 3
W0 = 3.5 – 3*(1.5) = -1
“What will be the class label for nodes with no training
samples?”
Answer 3(A)
Info(D) = -(2/6)(log (2/6) / log (2)) - (4/6)(log (4/6) /
log (2)) = 0.92
...
Intermediate calculation: [-(1/3)*(log(1/3) / log (2))
-(2/3)*(log(2/3) / log (2))] = 0.92
...
...
Tag: BITS WILP Data Mining Mid-Sem Exam 2017-H2
Question 3(B)
...
Answer 4:
Example:
•
Support
—
Usefulness of
discovered rules
•
Confidence
—
Certainty of
discovered rules
computer => antivirus
software [support = 2%, confidence = 60%]
·
A support of 2%
means that 2% of all the transactions under analysis show that computer and
a.v. are purchased together.
·
A confidence of
60% means that 60% of the customers who purchased a computer also bought the
software.
...
Answer 4(B)
From Stackoverflow.com
Ques: I
want to find out the maximal frequent item sets and the closed
frequent item sets.
Frequent
item set X∈F is maximal
if it does not have any frequent supersets.
Frequent
item set X ∈ F is closed if it has no superset
with the same frequency
So I
counted the occurrence of each item set.
{A}
= 4 ; {B} = 2 ; {C} = 5
; {D} = 4 ; {E} = 6
{A,B}
= 1; {A,C} = 3; {A,D} = 3; {A,E} = 4; {B,C} = 2;
{B,D}
= 0; {B,E} = 2; {C,D} = 3; {C,E} = 5; {D,E} = 3
{A,B,C}
= 1; {A,B,D} = 0; {A,B,E} = 1; {A,C,D} = 2; {A,C,E} = 3;
{A,D,E}
= 3; {B,C,D} = 0; {B,C,E} = 2; {C,D,E} = 3
{A,B,C,D}
= 0; {A,B,C,E} = 1; {B,C,D,E} = 0
Min_Support
set to 50%
Does maximal
= {A,B,C,E}?
Does closed
= {A,B,C,D} and {B,C,D,E}?
…
Ans:
Note:
{A} = 4 ; not
closed due to {A,E}
{B} = 2 ; not
frequent => ignore
{C} = 5 ; not
closed due to {C,E}
{D} = 4 ; closed,
but not maximal due to e.g. {A,D}
{E} = 6 ; closed,
but not maximal due to e.g. {D,E}
{A,B} = 1; not frequent => ignore
{A,C} = 3; not closed due to {A,C,E}
{A,D} = 3; not closed due to {A,D,E}
{A,E} = 4; closed, but not maximal due to {A,D,E}
{B,C} = 2; not frequent => ignore
{B,D} = 0; not frequent => ignore
{B,E} = 2; not frequent => ignore
{C,D} = 3; not closed due to {C,D,E}
{C,E} = 5; closed, but not maximal due to {C,D,E}
{D,E} = 4; closed, but not maximal due to {A,D,E}
{A,B,C} = 1; not frequent => ignore
{A,B,D} = 0; not frequent => ignore
{A,B,E} = 1; not frequent => ignore
{A,C,D} = 2; not frequent => ignore
{A,C,E} = 3; maximal frequent
{A,D,E} = 3; maximal frequent
{B,C,D} = 0; not frequent => ignore
{B,C,E} = 2; not frequent => ignore
{C,D,E} = 3; maximal frequent
{A,B,C,D} = 0; not frequent => ignore
{A,B,C,E} = 1; not frequent => ignore
{B,C,D,E} = 0; not frequent =>
ignore
|
Answer to problem:
Frequent
item set X∈F is maximal
if it does not have any frequent supersets.
Frequent
item set X ∈ F is closed if it has no superset with
the same frequency
Closed 2-itemsets: “ab, bc, bd”
Maximal 2-itemsets: “bd”
*****
No comments:
Post a Comment