Download problem and data files
Assignment
Data Mining (IS ZC415)
Due date: 20 November 2017
Upload the assignment on elearn as a single pdf file containing numbered
solutions to the problems. Your BITS ID should be used as the filename.
Make suitable assumptions where necessary.
Do NOT upload a zipped file.
All input files related to assignment can be found on elearn.
1 Problem
You are given a file classify.csv that contains training samples containing multiple features and their class as last column. The file has 70 training examples. Each example has first column as 1 or 0 label. Last column in each line is just an ID. Do not use the last column. Create a Random Forest classifier, perform 10-fold cross validation and report F1-score of the classifier:
A. Without any pruning.
B. With suitable pruning. Mention the pruning you have performed.
2 Problem
You are given a file cluster.txt containing multiple messages separated by newlines discussing an event. Use Agglomerative Hierarchical Clustering with Single Link and Complete Link. Use cosine similarity between messages as proximity measure.
1. Create a dendrogram for each of the Single Link and Complete Link.
2. Now, cut the dendrogram at the best possible height and report Silhouette of the
clustering using Single Link and Complete Link.
******** END *********
Assignment
Data Mining (IS ZC415)
Due date: 20 November 2017
Upload the assignment on elearn as a single pdf file containing numbered
solutions to the problems. Your BITS ID should be used as the filename.
Make suitable assumptions where necessary.
Do NOT upload a zipped file.
All input files related to assignment can be found on elearn.
1 Problem
You are given a file classify.csv that contains training samples containing multiple features and their class as last column. The file has 70 training examples. Each example has first column as 1 or 0 label. Last column in each line is just an ID. Do not use the last column. Create a Random Forest classifier, perform 10-fold cross validation and report F1-score of the classifier:
A. Without any pruning.
B. With suitable pruning. Mention the pruning you have performed.
2 Problem
You are given a file cluster.txt containing multiple messages separated by newlines discussing an event. Use Agglomerative Hierarchical Clustering with Single Link and Complete Link. Use cosine similarity between messages as proximity measure.
1. Create a dendrogram for each of the Single Link and Complete Link.
2. Now, cut the dendrogram at the best possible height and report Silhouette of the
clustering using Single Link and Complete Link.
******** END *********
No comments:
Post a Comment