BITS WILP Machine Learning (ISZC464) Quiz-1 2017-H2
Q1.
Which of the following statements are true in context of Graphical models?
Select one or more:
a. None of the above.
b. Bayesian Belief networks describe conditional independence among subsets of variables.
c. Bayes network represents the joint probability distribution over a collection of random variable.
d. Each node denotes a random variable.
Answer: (B, C, D)
This can be understood as there will be 2^n rows
in one truth table for X1, X2. Now, assume Y to be a vector of length 2^n,
number of states it can take = 2^(2^n).
This implies “A and B are conditionally
independent given C”.
Q1.
Which of the following statements are true in context of Graphical models?
Select one or more:
a. None of the above.
b. Bayesian Belief networks describe conditional independence among subsets of variables.
c. Bayes network represents the joint probability distribution over a collection of random variable.
d. Each node denotes a random variable.
Answer: (B, C, D)
A Bayesian network, Bayes network, belief
network, Bayes(ian) model or probabilistic directed acyclic
graphical model is a probabilistic graphical
model (a type of statistical model) that represents a set of random
variables and their conditional dependencies via a directed acyclic graph (DAG).
Q2.
Assuming log base 2, the entropy of a binary feature with p(x=1) = 0.5 is
Select one:
a. 0.75
b. 0
c. 0.25
d. 1
e. 0.5
Answer: (D) It is '1'.
Q3.
Select one or more:
a. A is conditionally independence of B given C.
b. B is conditionally independence of A given C.
c. B is not conditionally independence of A given C.
d. A is not conditionally independence of B given C.
Answer: C, D
(B is not conditionally independence
of A given C., A is not conditionally independence
of B given C.)
Q4.
Let X be random variable and let Y=aX+b, where a and b are given scalars. Then which of the following statements are true. (E[Z] states the expected value of Z)
Select one or more:
a. E[Y]=(a/b)*E[X]
b. E[Y] = E[X]
c. E[Y]=a*E[X]+b
d. E[Y]=a*b*E[X]
Answer: (C)
Q5.
When we can use Expectation maximization algorithm.
Select one or more:
a. None of the these.
b. Unsupervised clustering (target value unobservable).
c. Data is only partially observable.
d. Supervised Learning (some instance attributes unobservable).
Answer: B, C, D
Q6.
Which of the following statements are true?
Select one or more:
a. Maximum a Posteriori estimation seek the estimate of Ө that is most probable, given the observed data, plus background assumptions about its value.
b. Maximum Likelihood estimation seek the estimate of Ө that is most probable, given the observed data, plus background assumptions about its value.
c. Maximum Likelihood estimation seek an estimate of Ө that maximizes the probability of the observed data.
d. Maximum a Posteriori estimation seek an estimate of Ө that maximizes the probability of the observed data.
Answer: A, C
In MLE, we don’t have prior knowledge, as in the example of
a toss of coin, about the coin whether it is biased or unbiased. We arrive at
Theta based on the data.
While in MAP, we incorporate our prior knowledge:
Q7.
If X is a vector of n attributes
and Y is boolean valued label. How many different functions are possible? (2^n represents 2n )
It is 2^(2^n)
If X has two attributes x1 and x2, then # of observations
one has to take are (x1=0, x2=0, y), (x1=0, x2=1, y), (x1=1, x2=0, y), (x1=1,
x2=1, y). 'n' attributes means 2^n states.
Number of functions would be: 2^(n^2)
X1, X2, Y
Function: 1
0, 0, 0
0, 1, 0
1, 0, 0
1, 1, 0
Function: 2
0, 0, 0
0, 1, 0
1, 0, 0
1, 1, 1
Run 1:
Input: X1, X2
0, 0
0, 1
1, 0
1, 1
Output: 0,0,0,0
Run 2:
Input: X1, X2
0, 0
0, 1
1, 0
1, 1
Output: 0,0,0,1
Run 3:
Input: X1, X2
0, 0
0, 1
1, 0
1, 1
Output: 0,0,1,0
Run 4:
Input: X1, X2
0, 0
0, 1
1, 0
1, 1
Output: 0,0,1,1
Run 5: Output: 0,1,0,0. Run 6: Output: 0,1,0,1. Run 7:
Output: 0,1,1,0. Run 8: Output: 0,1,1,1
Run 9: Output: 1,0,0,0. Run 10: Output: 1,0,0,1. Run 11:
Output: 1,0,1,0. Run 12: Output: 1,0,1,1
Run 13: Output: 1,1,0,0. Run 14: Output: 1,1,0,1. Run 15:
Output: 1,1,1,0. Run 16: Output: 1,1,1,1
Which of the following statements are true?
Select one or more:
a. To infer posterior probability, Bayesian linear regression uses Naïve Bayes principle.
b. None of these
c. Bayesian linear regression cannot be used for classification.
d. In Bayesian linear regression Prior can be used for regularization.
Answer: A, D
In Bayesian linear regression Prior can be used for
regularization., To infer posterior probability, Bayesian linear regression
uses Naïve Bayes principle.
...
...
These
slides show the derivation of posterior probability using Bayes theorem, and
here all probabilities are represented by multivariate Gauss distribution.
Q9.
Which of the following statement are true for the given graphical model?
Select one or more:
a. B is not conditionally independence of A given C.
b. A is conditionally independence of B given C.
c. B is conditionally independence of A given C.
d. A is not conditionally independence of B given C.
Answer: B, C
Proved in : P(A,B | C) = P(A,B,C)/P(C) = P(A)*P(C|A)*P(B|A)/P(C)
= P(A|C) * P(B|C)
Hence, P(A,B | C) =
P(A|C) * P(B|C)
Q10.
Smoothing can be used in which of the following cases:
Select one or more:
a. When likelihood estimates zero probability
b. When test error and training error are very different
c. None of the above
d. When learning algorithm result in very rough function
Answer: A
(When likelihood estimates zero probability.)
Could someone explain
Laplacian smoothing (or 1-up smoothing)?
Ans: Suppose you are looking at outcomes of a die. Let us say
you get the following outcomes of each number, in 10 throws:
One : 1
Two : 3
Three : 1
Four : 0
Five : 3
Six : 2
Now, the probabilities without the smoothing are
One : 1/10
Two : 3/10
Three : 1/10
Four : 0/10
Five : 3/10
Six : 2/10
The sums of probabilities is (of course) 1.
To smoothen out, we add '1' to numerators. Now we need to add "something" to the denominator such that the sum remains 1.
So,
(1+1+3+1+1+1+0+1+3+1+2+1) / (10+K) = 1
This gives K=6. Now note that if you had zero throws, the probabilities are all 1/6. These are called the "prior probabilities" - our prior assumption of the outcomes. We initially believe all of them are equally likely.
And K=6 is essentially the no. of classes!
One : 1
Two : 3
Three : 1
Four : 0
Five : 3
Six : 2
Now, the probabilities without the smoothing are
One : 1/10
Two : 3/10
Three : 1/10
Four : 0/10
Five : 3/10
Six : 2/10
The sums of probabilities is (of course) 1.
To smoothen out, we add '1' to numerators. Now we need to add "something" to the denominator such that the sum remains 1.
So,
(1+1+3+1+1+1+0+1+3+1+2+1) / (10+K) = 1
This gives K=6. Now note that if you had zero throws, the probabilities are all 1/6. These are called the "prior probabilities" - our prior assumption of the outcomes. We initially believe all of them are equally likely.
And K=6 is essentially the no. of classes!
Q11.
Which of the following statements are true in context of decision trees?
Select one or more:
a. Capable in classifying non-linearly separable data.
b. None of these.
c. Capable in classifying linearly separable data.
d. It is always possible to get zero training error.
Answer: A, C, D
“Zero training error”: means
decision tree can give a model that will give correct output for all the traning
data.
An attribute is a discreete
valued variable. While traversing a decision tree downwards based on
attributes, it is always possible to arrive at a label (decision (yes, no)).
Q12.
Let a probability of disease is 1 in 10,000 and the test accuracy of the disease is 99 %. Let event A is the event you have this disease, and event B is the event that you test positive. Given test is positive what is the probability that disease is actually present? Precisely you need calculate probability P(A|B)
Select one:
a. 0.0990
b. 0.0988
c. 0.9902
d. 0.0098
Answer: (D = 0.0098)
...
Q13.
In context of Bais-Variance decomposition which of the following statements are true?
Select one or more:
a. High bais implies high vairiance in the out of sample error.
b. High variance implies less bias in the out of sample error.
c. Less bais implies less variance in the out of sample error.
d. Bais-Variance analysis help us to quantify out of sample error.
Answer: B, D
(From L6-Part2 last slide)
(URL: http://www.stat.cmu.edu/~ryantibs/advmethods/notes/errval.pdf)
Q14.
In context of linear regression, which of the following statements are true?
Select one or more:
a. You can use linear regression for classification.
b. It is not possible to get zero training error, if there are few samples used in training.
c. It is not possible to get zero test error, if there are few samples used in training.
d. You cannot use linear regression for classification.
Answer: A, C
Training error is the error that
you get when you run the trained model back on the training data. Remember that
this data has already been used to train the model and this necessarily doesn't
mean that the model once trained will accurately perform when applied back on
the training data itself.
Test error is the error when you get when you run the trained model on a set of data that it has previously never been exposed to. This data is often used to measure the accuracy of the model before it is shipped to production.
Test error is the error when you get when you run the trained model on a set of data that it has previously never been exposed to. This data is often used to measure the accuracy of the model before it is shipped to production.
………………………………………………
URL: https://stats.stackexchange.com/questions/22381/why-not-approach-classification-through-regression
QUESTION: Some material I've seen on machine learning said
that it's a bad idea to approach a classification problem through regression.
But I think it's always possible to do a continuous regression to fit the data
and truncate the continuous prediction to yield discrete classifications. So
why is it a bad idea?
ANSWER:
"..approach
classification problem through regression.." by
"regression" I will assume you mean linear regression, and I will
compare this approach to the "classification" approach of fitting a
logistic regression model.
Before we do this, it is important to clarify the
distinction between regression and classification models. Regression models
predict a continuous variable, such as rainfall amount or sunlight intensity.
They can also predict probabilities, such as the probability that an image
contains a cat. A probability-predicting regression model can be used as part
of a classifier by imposing a decision rule - for example, if the probability
is 50% or more, decide it's a cat.
Logistic regression predicts probabilities, and is
therefore a regression algorithm. However, it is commonly described as a
classification method in the machine learning literature, because it can be
(and is often) used to make classifiers. There are also "true"
classification algorithms, such as SVM, which only predict an outcome and do
not provide a probability. We won't discuss this kind of algorithm here.
Linear vs. Logistic Regression on Classification Problems
As Andrew Ng
explains it, with linear regression you fit a polynomial through the data -
say, like on the example below we're fitting a straight line through {tumor size, tumor type} sample
set:
Above, malignant tumors get 1and
non-malignant ones get 0, and the green line is our hypothesis h(x).
To make predictions we may say that for any given tumor size x,
if h(x)
gets bigger than 0.5
we predict malignant tumor, otherwise we predict benign.
Q15.
Which of the following statements are true?
Select one or more:
a. None of these.
b. In multiple classes classification One-versus-the-rest method results in ambiguous regions.
c. Goal in classification is to take an input vector x and to assign it to one of K discrete classes.
d. Discriminant function maps each input x, directly onto the class label.
Answer: B, C, D
...
...
...
No comments:
Post a Comment