Types of Machine Learning Algorithms
Decision Tree Induction
“Decision tree induction” is the learning of decision trees from class-labeled training tuples. A “decision tree” is a flowchart-like tree structure, where: # Each “internal node” (non-leaf node) denotes a test on an attribute, # Each “branch” represents an outcome of the test, and # Each leaf “node” (or terminal node) holds a class label # The topmost node in a tree is the “root” node.Our Data Set
Is This What We Want?
Attribute Selection Measures (or Splitting Rules)
1. Information gain (entropy) 2. Gain ratio 3. Gini IndexAttribute Selection Measures: Information Gain
Information Gain When We Split on Age
Information gain when split on other columns
We select age as first split
But Let Us Also Check It For Student Attribute:
‘Student’ Values are: Yes and No Weights for (Yes and No): Yes => 7 / 14 No => 7 / 14Component for Student -> Yes =
(7/14) * (-(6/7) * log2 (6/7) - (1/7) * log2 (1/7)) = 0.5 * ( (-0.857) * log2 ( 0.857 ) - (0.1428 * log2 ( 0.1428 ) ) ) = 0.5 * ( (-0.857) * (-0.22263) - (0.1428 * (-2.808)) ) = 0.2958Component for Student -> No
# Student -> No = 7 # No(s) = 4 # Yes(s) = 3 (7/14) * (-(3/7) * log2 (3/7) - (4/7) * log2 (4/7)) = 0.5 * ( (-0.4285 * log2(0.4285)) - 0.5714 * log2 (0.5714) ) = 0.5 * ( (-0.4285 * -1.2226) - (0.5714 * -0.8074) ) = 0.4926 Information (Student) = 0.2958 + 0.4926 = 0.7884Attribute Selection Measures: Gini Index
Consider each attribute and all possible split. Ex. Let’s consider attribute income. Find Gini index of the split into subset {low, medium} and {high}Gini Index Calculation For Split on Income
= 0.714 * (1 - 0.49 - 0.09) + 0.285 * (1 - 0.0625 - 0.5625) = 0.406755Measuring Classification
Monday, February 21, 2022
Decision Tree Learning
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment