Concepts of Probability
Indepedent Events
Flipping a coin twice.Dependent Events
Drawing two cards one by one from a deck without replacement. First time: 52 cards P(Jack of hearts) = 1/52 At the time of drawing second card, deck has now left with: 51 cards So the deck at the time of second draw has changed because we are doing it without replacementAddition Rule
Multiplication Rule
Bayes Theorem
What Is The Probability Of Getting “Class Ck And All The Evidences 1 To N”:
X1 to XN Are Our Evidence Events And They Are All Independent As Assumed In Naïve Bayes Algorithm (Or Classification). P(x1, x2, x3, C) = P(x1|(x2, x3, C)) . P(x2, x3, C) RHS = P(x1|(x2, x3, C)).P(x2|(x3, C)).P(x3, C) RHS = P(x1|(x2,x3,C)) . P(x2 | (x3, C)) . P(x3 | C) . P(C) And if x1, x2 and x3 are independent of each other: RHS = P(x1 | C) . P(x2 | C) . P (x3 | C) . P(C)FRUIT PROBLEM
A fruit is long, sweet and yellow. Is it a banana? Is it an orange? Or is it some different fruit? P(Banana | Long, Sweet, Yellow) = (P(Long, Sweet, Yellow | Banana) * P(Banana)) / P(L,S,Y) P(L,S,Y | B) = P(L,S,Y,B) / P(B) Naïve Bayes => All the events (such as L, S, Y) are independent. Now, using the 'Chain Rule' along side 'Independence Condition': => P(L, S, Y, B) = P(L|B) * P(S|B) * P(Y|B) * P(B) - - - P(Orange | Long, Sweet, Yellow) Answer: Whichever P() is higher P(Banana) = 50 / 100 P(Orange) = 30 / 100 P(Other) = 20 / 100 P(Long | Banana) = 40 / 50 = 0.8 P(Sweet | Banana) = 35 / 50 = 0.7 P(Yellow | Banana) = 45 / 50 = 0.9 P(Banana|Long, Sweet and Yellow) = P(Long|Banana) * P(Sweet|Banana) * P(Yellow|Banana) * P(banana)/ (P(Long) * P(Sweet) * P(Yellow)) = 0.8 * 0.7 * 0.9 * 0.5 / P(evidence) =0.252/denom P(Orange|Long, Sweet and Yellow) = 0 P(Other Fruit|Long, Sweet and Yellow) = P(Long|Other fruit) * P(Sweet|Other fruit) * P(Yellow|Other fruit) * P(Other Fruit)/ (P(Long) * P(Sweet) * P(Yellow)) = 0.018/denom P(ham | d6) and P(spam | d6) D6: good? Bad! very bad! P(ham | good, bad, very, bad) = P (good, bad, very, bad, ham) / P(good, bad, very, bad)) P(good, bad, very, bad, ham) = P(good|ham)*P(bad|ham)*P(very|ham)*P(bad|ham)*P(ham) Classified as spam!Practice Question
Ques 1: What is the assumption about the dataset on which we can apply Naive Bayes' classification algorithm? Ans 1: That the evidence events should be independent of each other. Ques 2: What is 'recall' metric in classification report? Ans 2: Recall: How many of the selected class instances have been predicted correctly (or we say “have been recalled”).
Wednesday, July 28, 2021
Naïve Bayes Classifier for Spam Filtering
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment