BITS WILP Advanced Data Mining End-Sem Exam (Regular) 2017-H2




Birla Institute of Technology & Science, Pilani
Work-Integrated Learning Programmes Division
First Semester 2017-2018

Comprehensive Examination
(EC-3 Regular)

Course No.                  : SS ZG548 
Course Title                 : ADVANCED DATA MINING  
Nature of Exam           : Open Book
Weightage                    : 50%
Duration                      : 3 Hours 
Date of Exam              : 05/11/2017    (FN)
No of Pages: 2
No of questions: 7
Note:
1.       Please follow all the Instructions to Candidates given on the cover page of the answer book.
2.       All parts of a question should be answered consecutively. Each answer should start from a fresh page. 
3.       Assumptions made if any, should be stated clearly at the beginning of your answer.

Q.1.        Consider clustering over evolving data stream. Briefly describe DenStream algorithm and compare its performance with respect to incremental DBSCAN.          [3 + 3 = 6]

Q.2.        Describe the terms variety, veracity and viability in Big Data perspective. Explain major steps of a Big Data analytics process.      [3 + 4 = 7]

Q.3.        MR-DBSCAN is an efficient parallel density-based clustering algorithm using map reduce framework. Explain how this algorithm divides the work among MAP processes and what is the significance of epsilon extended neighborhood with an example.    [3 + 3 = 6]

Q.4.        Define subsequence in a sequence database. Determine the percentage support for a sequence <{1,3}{1}>  in the following database having 12 sequences. 
A=<{1,2}{1,2,3}{2}{3}{1}>
B=<{2}{3,4,5}{1}{3,1}>
C=<{1}{2,3}{5}{1,3}{3,1}>
D=<{1,3,4}{4}{2,1}>
E=<{3}{1}{2,3}{1,2,3}>
F=<{1}{1,2,3}{1,3}{4,2}>
G=<{3}{1}{2,3}{1,2,3}{2,4}{3}>
H=<{1,3}{2,3}{5}{3}{1}{1,3}{1}>
I=<{3}{3,4,5}{1}{1}{2,3}{1,2,3}>
J=<{1}{4,2}{1,2,3}{3,4,5}{1}{1,3}>
K=<{3}{1}{2,3}{3,4,5}{1}{1,2,3}{2}>
L=<{1,2,3}{5}{3,4,5}{1}{3}{1}{1,3}{1}>
Clearly show all the steps                                                                                            [1 + 5 = 6]

Q.5.        Let us define hotlink as the URL (uniform resource locator or web address) of a frequently visited page. In case, a web developer knows about hotlink pages of his website, he may put those links on his index page itself to better facilitate the visitor. Provide and algorithm (write pseudo code) to recognize hotlinks when a logbook containing sequence of web pages visited by many users is available.                               [4]

SS ZG548 (EC-3 Regular)                               First Semester 2017-2018                                Page 2

Q.6.        Consider four text files T1, T2, T3, T4 as given below.

T1
Mahendra Singh Dhoni is an Indian batsmen cricketer who captained the Indian team in limited-overs formats from 11th of September 2007 to 4th of January 2017 and in Test cricket from 2008 to 28th of December 2014. An attacking right-handed middle-order batsman and wicket-keeper, he is widely regarded as one of the greatest finishers in limited-overs cricket. He is also regarded to be one of the best wicket-keepers in world cricket and is known to have very fast hands. He made his One Day International (ODI) debut in December 2004 against Bangladesh, and played his first Test a year later against Sri Lanka.
T2
Virat Kohli is an Indian international cricketer who currently captains the India national team. A right-handed cricket batsman, often regarded as one of the best batsmen in the world, Kohli was ranked eighth in ESPN's list of world's most famous athletes in 2016. He plays for the Royal Challengers Bangalore in the Indian Premier League (IPL), and has been the team's captain since 2013.
T3
Cristiano Ronaldo is a Portuguese professional footballer who plays as a forward for Spanish club Real Madrid and the Portugal national team. Often considered the best player in the world and widely regarded as one of the greatest of all time, Ronaldo has four FIFA Ballon d'Or awards, the most for a European player, and is the first player in history to win four European Golden Shoes. He has won 24 trophies in his career, including five league titles, four UEFA Champions League titles and one UEFA European Championship. A prolific goalscorer, Ronaldo holds the records for most official goals scored in the top five European leagues, the UEFA Champions League and the UEFA European Championship, as well as the most goals scored in a UEFA Champions League season. He has scored more than 600 senior career goals for club and country.
T4
Lionel Andres Messi is an Argentine professional footballer who plays as a forward for Spanish club FC Barcelona and the Argentina national team. Often considered the best player in the world and regarded by many as the greatest of all time, Messi is the only player in history to win five FIFA  awards, four of which he won consecutively, and a record-tying four European Golden Shoes. He has won 29 trophies with Barcelona, including eight La Liga titles, four UEFA Champions League titles, and five Copas del Rey. He has scored over 600 senior career goals for club and country.

Assuming the codebook to contain following world in order  <awards, Barcelona, cricketer, footballer, history, including, professional, senior, Shoes, win, career, cricket, national, club, goals, Indian, team, player> determine following
(a)             Binary Term-Document Incidence Matrix (codeword) for all four texts T1, T2, T3, T4.
(b)             Similarity between all four pair of texts. Where similarity is defined as the number of on bits in corresponding codeword. (fill the following table)          [8 + 6 = 14]


T1
T2
T3
T4
T1
-
?
?
?
T2
-
-
?
?
T3
-
-
-
?
T4
-
-
-
-

Q.7.         Consider a mini-web having four pages A, B, C, D characterized by adjacency matrix given below.

Here, 1 in corresponding cell represents the existence of a hypelink. For example, 1 in first row, fourth column means that there is a hyperlink in webpage A that points to webpage D. Draw the mini web as a graph by showing pages a nodes and links as edges. Determine authoritative and  hub ranking for each of the pages (execute up to three iterations only. Clearly mention scores for each iteration).                  [1 + 3 + 3 = 7] 
 
 





1 comment:

  1. Hi,
    Can i get the answer paper for above question paper
    Regards
    Sivakumar R

    ReplyDelete