Birla
Institute of Technology & Science, Pilani
Work-Integrated
Learning Programmes Division
Second
Semester 2016-2017
Comprehensive
Examination
(EC-3
Regular)
Course
No. : IS ZC464
Course
Title : MACHINE LEARNING
Nature
of Exam : Open Book
Weightage : 50%
Duration
: 3 Hours
Date of Exam : 09/04/2017 (FN)
No of questions: 6
No of pages: 2
No of questions: 6
No of pages: 2
Note:
1. Please
follow all the Instructions to Candidates given on the cover page of the
answer book.
2. All
parts of a question should be answered consecutively. Each answer should start
from a fresh page.
3. Assumptions
made if any, should be stated clearly at the beginning of your answer.
Q.1 (a) When the training set is small, the contribution of variance
to error may be more than that of bias and in such a case, we may prefer a
simple model even though we know that it is too simple for the task. Give an
example?
Answer
1A
Given, training set is small
=> variance will be high, bias will be low
Now, if we do under-fitting,
that is take a simple model, this would result in “low variance and high bias”.
So now variance from
under-fitting cancels variance from ‘small training set’ and bias from ‘small
training set’ cancels bias from ‘under-fitting’.
Typical trend: under_fitting means
high bias and low variance, over_tting means low bias but
high variance. E.g., think about k in k-nearest-neighbors regression: relatively speaking, how
do the bias and variance behave for small k, and for large k?
...
Ans:
Figure (A1)
Figure (A2)
H0 is a more restrictive model
using only a constant line.
Variance is the squared value,
so it is in both the directions.
Here, one can see as the degree
of freedom is increased from 0 to 1, this result in high variance.
As you increase degree of
freedom, you get less biased, it means the function you are using to represent
the true model is closer to the true function or can do good approximation of
the true function.
Q.1 (b)
How can we make k-means robust to outliers? [4 + 4 = 8]
Answer
1B:
Using
URL: https://arxiv.org/pdf/1201.6082.pdf
First
randomly select K initial “centers” μ1, . . . ,μK and iterate the following two
steps until convergence:
(a) Given cluster centers μ1, . . . ,μK, assign
each point to the cluster with the closest center.
(b) Given
a cluster assignment,
update the cluster
centers to be
the sample mean
of the observations in each
cluster. Although this algorithm decreases the objective function at each
iteration it may be trapped in different local minima. Hence, it is started several times and the
best solution is returned. Cuesta-Albertos et al. (1997) proposed a
modification of this algorithm in order to obtain outlier-robust clusters. The main idea is to replace step (b) above by
(b’) Given a cluster assignment, trim
the α*100% observations with largest distance to their cluster centers,
and update the
cluster centers to
the sample mean
of the remaining observations in
each cluster. The tuning parameter α regulates the amount of trimming and is
selected by the user.
…
In
our experience the choice α = 0.10 suffices for most applications.
…
The main
idea is to
adapt the Sparse
K-means algorithm of
Witten and Tibshirani (2010) by
trimming a fixed proportion of observations that are farthest away from their
cluster centers (using the approach of the Trimmed K-means algorithm of
Cuesta-Albertos et al. (1997)).
Q. 2(a) What is the
significance of learning rate in gradient descent and what is the effect of
increasing or decreasing the learning rate on the convergence of gradient
descent?
Answer 2A:
...
…
‘Learning
rate’ determines the steps at which the delta-wi changes. Large value of
‘learning rate’ results in bigger steps and vice versa. During the first step,
we choose small learning rate and then find the new value of E[w]. If value of
new_E[w] – E[w] is very large, we reduce ‘learning rate’ further, if value of
new_E[w] – E[w] is normal, we continue
for few more steps. Then again check new_E[w] – E[w], if has reduced (or
starting to reduce), we have to reduce our learning rate so that we do not take
big steps and overshoot past the minimum point. As the gradient descent is
converging, we keep reducing the ‘learning rate’ to take smaller and smaller
steps. We keep moving further this way until the sign of (new_E[w] – E[w]) is
positive, which means gradient descent converged on the last step.
Q. 2(b) Solve below truth table problem with the
Perceptron Training Rule.
X1
|
X2
|
Target
|
0
|
0
|
1
|
0
|
1
|
1
|
1
|
0
|
0
|
1
|
1
|
1
|
Assume X0 = -1,
weights are initialized as w0 = 0.2, w1 = -0.3, w2 = 0.6 and learning rate Þ =
0.1, Activation occurs when ∑Xi*wi>=0. [3 + 5 = 8]
Answer 2B:
Perceptron
O(x1, x2) = w0 * x0
+ w1 * x1 + w2 * x2
If w0 and x0 are
constants (product w0*x0 is called ‘bias’):
O(0, 0) = w0 * x0 =
0.2 * (-1) = -0.2 => 0
Best case of output
we can get is: (0, 1, 0, 1)
E(best case of w) = 0.5 * ( (1-0)^2 + (1-1)^2 + (0-0)^2
+ (1-1)^2) = 0.5
|
PASS-1
O(x1, x2) = 0.2 * (-1)
+ (-0.3) * x1 + 0.6*x2
If O(w.x) >= 0, O = 1, else if O(w.x) < 0, O = 0.
X1
|
X2
|
w.x or Sum(wi.xi)
|
O
|
T
|
0
|
0
|
-0.2
|
0
|
1
|
0
|
1
|
-0.2 + 0.6 = 0.4
|
1
|
1
|
1
|
0
|
-0.2 – 0.3 = -0.5
|
0
|
0
|
1
|
1
|
-0.2 – 0.3 + 0.6 =
0.1
|
1
|
1
|
E[w] = 0.5 * ( (1-0)^2
+ (1-1)^2 + (0 - 0)^2 + (1-1)^2) = 0.5
If w0 and x0 are not changed, then (sum-of-square-errors)
E[w] will not change.
For (0, 0)
del_w1 = n * (t - o) * xi = 0.1
* (1 – 0) * 0 = 0;
del_w2 = 0;
del_w0 = 0.1 * (1 – 0) * -1 = -0.1
For (0, 1)
del_w1 = del_w1 + 0 = 0;
del_w0 =
del_w0 + 0.1 * (1
– 1) * -1 = -0.1 + 0.1 * (1 – 1) * -1
= -0.1
For (1, 0)
del_w1 = del_w1 + 0.1 * (0 – 0))
* 1 = 0 + 0 = 0;
del_w2 = del_w2 + 0 = 0;
del_w0 = del_w0 + 0.1 * (0 – 0)
* -1 = -0.1 + 0.1 * (0 – 0) * -1 = -0.1
For (1, 1)
del_w1 = del_w1 + 0 = 0;
del_w2 = del_w2 + 0 = 0
del_w0 = del_w0 + 0.1 * (1 – 1)
* -1 = -0.1 + 0.1 * (1 – 1) * -1 = -0.1
Update w-vector:
w1 = w1 + del_w1 = -0.3 + 0 =
-0.3;
w2 = w2 + del_w2 = 0.6 + 0 = 0.6
w0 = w0 + del_w0 = 0.2 + (-0.1)
= 0.1
PASS-2
O(x1, x2) = 0.1 * (-1)
+ (-0.3) * x1 + 0.6*x2
X1
|
X2
|
w.x or Sum(wi.xi)
|
O
|
T
|
0
|
0
|
-0.1
|
0
|
1
|
0
|
1
|
-0.1 + 0.6 = 0.5
|
1
|
1
|
1
|
0
|
-0.1 – 0.3 = -0.4
|
0
|
0
|
1
|
1
|
-0.1 – 0.3 + 0.6 =
0.2
|
1
|
1
|
E[w] = 0.5 * ( (1-0)^2
+ (1-1)^2 + (0-0)^2 + (1-1)^2) = 0.5
For (0, 0)
del_w1 = 0.1 * (1 – 0) * 0 = 0;
del_w2 = 0;
del_w0 = 0.1 * (1 – 0) * -1 = -0.1
For (0, 1)
del_w1 = del_w1 + 0 = 0;
del_w2 = del_w2 + 0.1 * (1 - 1)
* 1 = 0
del_w0 = del_w0 + 0.1 * (1 – 1)
* -1 = -0.1 + 0.1 * 0 * -1 = -0.1
For (1, 0)
del_w1 = del_w1 + 0.1 * (0 – 0)
* 1 = 0 + 0 = 0;
del_w2 = del_w2 + 0 = 0
del_w0 = del_w0 + 0.1 * (1 – 1)
* -1 = -0.1 + 0.1 * 0 * -1 = -0.1
For (1, 1)
del_w1 = del_w1 + 0 = 0;
del_w2 = del_w2 + 0 = 0
del_w0 = del_w0 + 0.1 * (1 – 1)
* -1 = -0.1 + 0.1 * 0 * -1 = -0.1
Update w-vector:
w1 = w1 + del_w1 = -0.3 + 0 =
-0.3;
w2 = w2 + del_w2 = 0.6
w0 = w0 + del_w0 = 0.1 + (-0.1)
= 0
PASS-3
O(x1, x2) = 0 * (-1) +
(-0.3) * x1 + 0.6*x2
Note that, given:
Activation occurs when ∑Xi*wi >= 0
X1
|
X2
|
w.x or Sum(wi.xi)
|
O
|
T
|
0
|
0
|
0
|
1
|
1
|
0
|
1
|
0.6
|
1
|
1
|
1
|
0
|
-0.3
|
0
|
0
|
1
|
1
|
-0.3 + 0.6 = 0.3
|
1
|
1
|
E[w] = 0.5 * ( (1-1)^2
+ (1-1)^2 + (0-0)^2 + (1-1)^2) = 0
Q.3 (a) Generate two off-springs from the given parent chromosomes and random template based on uniform order-based crossover operator.
Template
|
1
|
1
|
0
|
0
|
1
|
1
|
0
|
Parent 1
|
C
|
B
|
G
|
E
|
F
|
D
|
A
|
Parent 2
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
Q.3 (b)
Generate two off-springs from the given parent chromosomes
based on cycle crossover operator.
Parent 1
|
7
|
4
|
5
|
10
|
6
|
3
|
9
|
8
|
2
|
1
|
Parent 2
|
6
|
7
|
8
|
9
|
10
|
1
|
2
|
3
|
4
|
5
|
[4 + 4 = 8]
Q.4.
Consider the following training set in 2-dimentional
Euclidean space.
Point
|
Coordinate
|
Class
|
X1
|
(-1, 1)
|
Negative
|
X2
|
(0, 1)
|
Positive
|
X3
|
(0, 2)
|
Negative
|
X4
|
(1, -1)
|
Negative
|
X5
|
(1, 0)
|
Positive
|
X6
|
(1, 2)
|
Positive
|
X7
|
(2, 2)
|
Negative
|
X6
|
(2, 3)
|
Positive
|
(a)
What is the class of the point (1, 1) if 3NN classifier is
consider?
(b)
What is the class of the point (1, 1) if 5NN classifier is
consider?
(c)
What is the class of the point (1, 1) if 7NN classifier is
consider? [3+3+3 = 9]
Answer 4:
Similar question from quiz-2 (2017-H2):
Answer:
Sol: (a, b, c, d) are correct.
(E.D. => Euclidean distance)
8NN:
Point
|
Coordinate
|
Class
|
E.D.(X, (0, 0))
|
Rank
|
X1
|
(-1, 1)
|
Negative
|
1.414
|
3
|
X2
|
(0, 1)
|
Positive
|
1.0
|
1
|
X3
|
(0, 2)
|
Negative
|
2.0
|
5
|
X4
|
(1, -1)
|
Positive
|
1.414
|
4
|
X5
|
(1, 0)
|
Positive
|
1.0
|
2
|
X6
|
(1, 2)
|
Positive
|
2.236
|
6
|
X7
|
(2, 2)
|
Negative
|
2.828
|
7
|
X8
|
(1, 3)
|
Positive
|
3.162
|
8
|
Here count(+ve) = 5 and count(-ve) = 3,
as count(+ve) > count(-ve) the class is ‘+ve’.
5NN: Consider the five nearest
neighbours
Point
|
Coordinate
|
Class
|
E.D.(X, (0, 0))
|
X1
|
(-1, 1)
|
Negative
|
1.414
|
X2
|
(0, 1)
|
Positive
|
1.0
|
X3
|
(0, 2)
|
Negative
|
2.0
|
X4
|
(1, -1)
|
Positive
|
1.414
|
X5
|
(1, 0)
|
Positive
|
1.0
|
Q.5.
Suppose, we are going to construct a Decision Tree using the
table given below.
Chills
|
Runny nose
|
Headache
|
Fever
|
Flu
|
Yes
|
No
|
Mild
|
Yes
|
No
|
Yes
|
Yes
|
No
|
No
|
Yes
|
Yes
|
No
|
Strong
|
Yes
|
Yes
|
No
|
Yes
|
Mild
|
Yes
|
Yes
|
No
|
No
|
No
|
No
|
No
|
No
|
Yes
|
Strong
|
Yes
|
Yes
|
No
|
Yes
|
Strong
|
No
|
No
|
Yes
|
Yes
|
Mild
|
Yes
|
Yes
|
The
table provides a set of eight training examples (S) of the target concept “flu”, where each instance is described
by the attributes Chills, Runny nose, Headache and Fever.
Construct
the decision tree for ‘Flu’, using information gain. Show each steps and the
final tree. [9]
Q.6.
A new mobile phone service chain store would like to open 20
service centres in Bangalore . Each service centre should cover at least one
shopping centre and 5,000 households of annual income over 75,000. Design a
scalable algorithm that decides locations of service centres by taking all the
aforementioned constraints into consideration [8]
Answer 6:
1. Identify the
households (their location) with annual income over 75,000.
2. Identify shopping
centers (their location) in the city.
3. Use k-Means to form
20 (k=20) clusters of shopping centers based on their distances with the other
shopping centers.
4. Take households as
query points to insert into these clusters formed above.
4a. An household is assigned to
the cluster by comparing two closest shopping centers, and one with the less
number of households attached to it is chosen. (We decided to attach households
to either of the two nearest shopping centers as a user is less prone to go to
the third distant service center leaving the first two.)
**********
Awesome...
ReplyDeleteSurvival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download Now
Delete>>>>> Download Full
Survival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download LINK
>>>>> Download Now
Survival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download Full
>>>>> Download LINK Ye
i tried mailing you , but it is not going through. Can you please share the answers to this question paper if you have.
ReplyDeleteThanks
Mail me at ashishjainblogger@gmail.com
DeleteMention the subject(s) for which you want help.
Thanks for putting up the content in such organized manner.Great Help. Thumbs UP !!
ReplyDeleteRecorded lecture links are not working please help
ReplyDeleteIs there a group/whatsapp group for Mtech in computing systems and Infrastructure?
ReplyDeleteVery useful .. Thank you very much
ReplyDeleteAre you Looking for Smart Device Development? QuantaEd Solutions is the Best Smart Device Development Company, We design and develop smart devices that suit the healthcare requirements. For any inquiry call us we will provide all kind of assistance. For more details visit- https://quantaedsolutions.com
ReplyDeleteThis post is so interactive and informative.keep updating more information...
ReplyDeleteSoftware Testing Courses in Mumbai
Software Testing Training in Ahmedabad
Software Testing Courses in Kochi
Software Testing Courses in Trivandrum
Software Testing Courses in Kolkata
Thanks for the blog article.Thanks Again. Keep writing.
ReplyDeletejava online training hyderabad
java online training in india
Thanks for the blog article.Much thanks again. Fantastic.
ReplyDeleteonline training in java
online training on java
AI & ML in Dubai
ReplyDeletehttps://www.nsreem.com/ourservices/ai-ml/
Artificial intelligence is very widespread today. In at least certainly considered one among its various forms has had an impact on all major industries in the world today, NSREEM is #1 AI & ML Service Provider in Dubai
1634348519669-9
Thank you for giving valuable information about software for portable device, we can also develop custom software from pixabulous design.
ReplyDeleteNice Blog!!!
ReplyDeleteServiceNow Training
ServiceNow Online Training in Hyderabad
This article explains in a clear manner. Nice way of explaining. Thanks for sharing. cloud engineering services
ReplyDeleteI really liked your blog post.Much thanks again. Awesome.
ReplyDeletejava online training
java training
Data Science Training In Noida
ReplyDeleteData Science course In Noida
WILP is a set of educational programs designed in such a way that they can be easily integrated into your work life. Earlier, only highly developed nations like the US and Europe were indoctrinating WILPs but now the WILP in India have also gained a lot of popularity.
ReplyDeleteCandidates who wish to take the BITSAT should begin studying as soon as possible. Due to the high level of competition, it is critical to follow the best BITSAT 2022 preparation tips recommended by professionals. This blog post contains BITSAT 2022 study suggestions as well as exam pattern and syllabus information. Continue reading to get answers to all of your questions. To know more information visit @ SSSi Online Tutoring Services.
ReplyDeleteSurvival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download Now
ReplyDelete>>>>> Download Full
Survival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download LINK
>>>>> Download Now
Survival8: Bits Wilp Software Development For Portable Devices 2017-H2 Handout >>>>> Download Full
>>>>> Download LINK bO
"Thanks for sharing this informative blog on Best Software Development company in chennai,Software Development Company in chennai,
ReplyDeleteBest Software Development company in india,
Top software development company in chennai,
Software Development Company in india"
The BITS Pilani Admission Process is designed to select the brightest minds for its world-class programs. With its independent entrance exam, BITSAT, and direct admission opportunities for board toppers, the institute ensures that only the most deserving candidates secure a place.
ReplyDelete