Monday, February 21, 2022

Decision Tree Learning

Types of Machine Learning Algorithms

Decision Tree Induction

“Decision tree induction” is the learning of decision trees from class-labeled training tuples. A “decision tree” is a flowchart-like tree structure, where: # Each “internal node” (non-leaf node) denotes a test on an attribute, # Each “branch” represents an outcome of the test, and # Each leaf “node” (or terminal node) holds a class label # The topmost node in a tree is the “root” node.

Our Data Set

Is This What We Want?

Attribute Selection Measures (or Splitting Rules)

1. Information gain (entropy) 2. Gain ratio 3. Gini Index

Attribute Selection Measures: Information Gain

Information Gain When We Split on Age

Information gain when split on other columns

We select age as first split

But Let Us Also Check It For Student Attribute:

‘Student’ Values are: Yes and No Weights for (Yes and No): Yes => 7 / 14 No => 7 / 14

Component for Student -> Yes =

(7/14) * (-(6/7) * log2 (6/7) - (1/7) * log2 (1/7)) = 0.5 * ( (-0.857) * log2 ( 0.857 ) - (0.1428 * log2 ( 0.1428 ) ) ) = 0.5 * ( (-0.857) * (-0.22263) - (0.1428 * (-2.808)) ) = 0.2958

Component for Student -> No

# Student -> No = 7 # No(s) = 4 # Yes(s) = 3
(7/14) * (-(3/7) * log2 (3/7) - (4/7) * log2 (4/7)) = 0.5 * ( (-0.4285 * log2(0.4285)) - 0.5714 * log2 (0.5714) )
= 0.5 * ( (-0.4285 * -1.2226) - (0.5714 * -0.8074) ) = 0.4926 Information (Student) = 0.2958 + 0.4926 = 0.7884

Attribute Selection Measures: Gini Index

Consider each attribute and all possible split. Ex. Let’s consider attribute income. Find Gini index of the split into subset {low, medium} and {high}

Gini Index Calculation For Split on Income

= 0.714 * (1 - 0.49 - 0.09) + 0.285 * (1 - 0.0625 - 0.5625) = 0.406755

Measuring Classification

Tags: Technology,Machine Learning,Classification,

No comments:

Post a Comment