Sunday, March 13, 2022

Interpretation of Decision Tree J48 output in Weka


Data Set Glimpse

@RELATION iris @ATTRIBUTE sepallength NUMERIC @ATTRIBUTE sepalwidth NUMERIC @ATTRIBUTE petallength NUMERIC @ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} The Data of the ARFF file looks like the following: @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.0,3.6,1.4,0.2,Iris-setosa ...

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: iris Instances: 150 Attributes: 5 sepallength sepalwidth petallength petalwidth class Test mode: 10-fold cross-validation === Classifier model (full training set) === J48 pruned tree ------------------ petalwidth <= 0.6: Iris-setosa (50.0) petalwidth > 0.6 | petalwidth <= 1.7 | | petallength <= 4.9: Iris-versicolor (48.0/1.0) | | petallength > 4.9 | | | petalwidth <= 1.5: Iris-virginica (3.0) | | | petalwidth > 1.5: Iris-versicolor (3.0/1.0) | petalwidth > 1.7: Iris-virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9 Time taken to build model: 0.36 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 144 96 % Incorrectly Classified Instances 6 4 % Kappa statistic 0.94 Mean absolute error 0.035 Root mean squared error 0.1586 Relative absolute error 7.8705 % Root relative squared error 33.6353 % Total Number of Instances 150 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class 0.980 0.000 1.000 0.980 0.990 0.985 0.990 0.987 Iris-setosa 0.940 0.030 0.940 0.940 0.940 0.910 0.952 0.880 Iris-versicolor 0.960 0.030 0.941 0.960 0.950 0.925 0.961 0.905 Iris-virginica Weighted Avg. 0.960 0.020 0.960 0.960 0.960 0.940 0.968 0.924 === Confusion Matrix === a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 2 48 | c = Iris-virginica

Interpretation of Model Output From Weka

=== Confusion Matrix === a b c <-- classified as 49 1 0 | a = Iris-setosa 0 47 3 | b = Iris-versicolor 0 2 48 | c = Iris-virginica TRUE LABEL and CLASSIFIER LABEL: Data points classified as Setosa and are actually Setosa: 49 (True Positives) False Positives (predicted Setosa but are not Setosa): 0 False Negative: 1 True Negative: 47 + 3 + 2 + 48 = 100 When TRUE LABEL == CLASSIFIER LABEL => TRUE POSITIVES For Versicolor: True Positives: 47 False Positives: (1 + 2) = 3 False Negative: 3 True Negative: 49 + 48 For Virginica: True Positives: 48 False Positives: 3 False Negative: 2 (predicted as "not virginica" but were actually "virginica") True Negative: 49 + 1 + 47 Recall: How many of the setosa class were predicted as setosa? How many of data points belonging to class X were also predicted as X? Recall = (TP) / (TP + FN) For Setosa = 49 / 50 = 0.98 For Versicolor: 47 / 50 = 0.94 For Virginica: 48 / 50 = 0.96 Precision: How many of the total predictions of X were actually X? Precision = (TP) / (TP + FP) For Setosa = 49 / (49 + 0) = 1 For Versicolor = 47 / (47 + 3) = 0.940 For Virginica = 48 / (48 + 3) = 0.941
Tags: Machine Learning,Weka,Technology,

No comments:

Post a Comment