Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
First
Semester 2017-2018
Comprehensive
Examination
(EC-3
Regular)
Course No. : SS ZG548
Course Title : ADVANCED DATA MINING
Nature of Exam : Open Book
Weightage : 50%
Duration : 3 Hours
Date
of Exam : 05/11/2017 (FN)
No
of Pages: 2
No
of questions: 7
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q.1.
Consider
clustering over evolving data stream. Briefly describe DenStream algorithm and
compare its performance with respect to incremental DBSCAN. [3 + 3 = 6]
Q.2.
Describe
the terms variety, veracity and viability in Big Data perspective. Explain
major steps of a Big Data analytics process. [3 + 4 = 7]
Q.3.
MR-DBSCAN
is an efficient parallel density-based clustering algorithm using map reduce
framework. Explain how this algorithm divides the work among MAP processes and
what is the significance of epsilon extended neighborhood with an example. [3 + 3 = 6]
Q.4.
Define
subsequence in a sequence database. Determine the percentage support for a
sequence <{1,3}{1}> in the
following database having 12 sequences.
A=<{1,2}{1,2,3}{2}{3}{1}>
B=<{2}{3,4,5}{1}{3,1}>
C=<{1}{2,3}{5}{1,3}{3,1}>
D=<{1,3,4}{4}{2,1}>
E=<{3}{1}{2,3}{1,2,3}>
F=<{1}{1,2,3}{1,3}{4,2}>
G=<{3}{1}{2,3}{1,2,3}{2,4}{3}>
H=<{1,3}{2,3}{5}{3}{1}{1,3}{1}>
I=<{3}{3,4,5}{1}{1}{2,3}{1,2,3}>
J=<{1}{4,2}{1,2,3}{3,4,5}{1}{1,3}>
K=<{3}{1}{2,3}{3,4,5}{1}{1,2,3}{2}>
L=<{1,2,3}{5}{3,4,5}{1}{3}{1}{1,3}{1}>
Clearly
show all the steps [1 + 5 = 6]
Q.5.
Let
us define hotlink as the URL (uniform resource locator or web address) of a
frequently visited page. In case, a web developer knows about hotlink pages of
his website, he may put those links on his index page itself to better
facilitate the visitor. Provide and algorithm (write pseudo code) to recognize
hotlinks when a logbook containing sequence of web pages visited by many users
is available. [4]
SS
ZG548 (EC-3 Regular) First
Semester 2017-2018 Page 2
Q.6.
Consider
four text files T1, T2, T3, T4 as given below.
T1
|
Mahendra
Singh Dhoni is an Indian batsmen cricketer who captained the Indian team in
limited-overs formats from 11th of September 2007 to 4th of January 2017 and
in Test cricket from 2008 to 28th of December 2014. An attacking right-handed
middle-order batsman and wicket-keeper, he is widely regarded as one of the
greatest finishers in limited-overs cricket. He is also regarded to be one of
the best wicket-keepers in world cricket and is known to have very fast
hands. He made his One Day International (ODI) debut in December 2004 against
Bangladesh, and played his
first Test a year later against Sri Lanka.
|
T2
|
Virat
Kohli is an Indian international cricketer who currently captains the India
national team. A right-handed cricket batsman, often regarded as one of the
best batsmen in the world, Kohli was ranked eighth in ESPN's list of world's
most famous athletes in 2016. He plays for the Royal Challengers Bangalore in the Indian
Premier League (IPL), and has been the team's captain since 2013.
|
T3
|
Cristiano
Ronaldo is a Portuguese professional footballer who plays as a forward for
Spanish club Real Madrid and the Portugal
national team. Often considered the best player in the world and widely regarded
as one of the greatest of all time, Ronaldo has four FIFA Ballon d'Or awards,
the most for a European player, and is the first player in history to win
four European Golden Shoes. He has won 24 trophies in his career, including
five league titles, four UEFA Champions League titles and one UEFA European
Championship. A prolific goalscorer, Ronaldo holds the records for most
official goals scored in the top five European leagues, the UEFA Champions
League and the UEFA European Championship, as well as the most goals scored
in a UEFA Champions League season. He has scored more than 600 senior career
goals for club and country.
|
T4
|
Lionel
Andres Messi is an Argentine professional footballer who plays as a forward
for Spanish club FC Barcelona and the Argentina national team. Often
considered the best player in the world and regarded by many as the greatest
of all time, Messi is the only player in history to win five FIFA awards, four of which he won consecutively,
and a record-tying four European Golden Shoes. He has won 29 trophies with Barcelona, including
eight La Liga titles, four UEFA Champions League titles, and five Copas del
Rey. He has scored over 600 senior career goals for club and country.
|
Assuming the codebook to contain
following world in order <awards, Barcelona, cricketer,
footballer, history, including, professional, senior, Shoes, win, career,
cricket, national, club, goals, Indian, team, player> determine following
(a)
Binary
Term-Document Incidence Matrix (codeword) for all four texts T1, T2, T3, T4.
(b)
Similarity
between all four pair of texts. Where similarity is defined as the number of
on bits in corresponding codeword. (fill the following table) [8 + 6 = 14]
|
T1
|
T2
|
T3
|
T4
|
T1
|
-
|
?
|
?
|
?
|
T2
|
-
|
-
|
?
|
?
|
T3
|
-
|
-
|
-
|
?
|
T4
|
-
|
-
|
-
|
-
|
Q.7.
Consider a mini-web having four pages A, B, C,
D characterized by adjacency matrix given below.
Here,
1 in corresponding cell represents the existence of a hypelink. For example, 1
in first row, fourth column means that there is a hyperlink in webpage A that
points to webpage D. Draw the mini web as a graph by showing pages a nodes and
links as edges. Determine authoritative and hub ranking for each of the pages
(execute up to three iterations only. Clearly mention scores for each
iteration). [1 + 3
+ 3 = 7]
Hi,
ReplyDeleteCan i get the answer paper for above question paper
Regards
Sivakumar R