survival8: Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)

Wednesday, September 16, 2020
Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)



Current version of Snorkel is v0.9.6 (as on 16-Sep-2020). Link to GitHub

Snorkel has 8 packages.

Package Reference:

1. Snorkel Analysis Package
2. Snorkel Augmentation Package
3. Snorkel Classification Package
4. Snorkel Labeling Package
5. Snorkel Map Package
6. Snorkel Preprocess Package
7. Snorkel Slicing Package
8. Snorkel Utils Package

What is Snorkel's Analysis Package for? 
This package dicusses how to interpret classification results. 

Generic model analysis utilities shared across Snorkel.

1: Scorer
Calculate one or more scores from user-specified and/or user-defined metrics.

This defines a class 'Scorer' with two methods: 'score()' and 'score_slices()'. You have specify input arguments such as metrics (this is related to the 'metric_score()' discussed below), true labels, predicted labels and predicted probabilities.

It is through this that we make use of code in 'metrics.py'

Code Snippet:


~~~   ~~~   ~~~

2: get_label_buckets
Return data point indices bucketed by label combinations.

This is a function written in the error_analysis.py file.
  
Code:
import snorkel
import numpy as np
from snorkel.analysis import get_label_buckets
print("Snorkel version:", snorkel.__version__) 

Snorkel version: 0.9.3

A common use case is calling ``buckets = label_buckets(Y_gold, Y_pred)`` where ``Y_gold`` is a set of gold (i.e. ground truth) labels and ``Y_pred`` is a corresponding set of predicted labels.
 
Y_gold = np.array([1, 1, 1, 0, 0, 0, 1])
Y_pred = np.array([1, 1, -1, -1, 1, 0, 1])

buckets = get_label_buckets(Y_gold, Y_pred) 
# If gold and pred have different number of elements >> ValueError: Arrays must all have the same number of elements

The returned ``buckets[(i, j)]`` is a NumPy array of data point indices with true label i and predicted label j. More generally, the returned indices within each bucket refer to the order of the labels that were passed in as function arguments.

print(buckets[(1, 1)])  # true positives where both are 1

Out: array([0, 1, 6])

buckets[(0, 0)]  # true positives where both are 0

Out: array([5])

# false positives, false negatives and true negatives
print((1, 0) in buckets, '/', (0, 1) in buckets, '/', (0, 0) in buckets)  

Out: False / True / True

buckets[(1, -1)]  # abstained positives

Out: array([2])

buckets[(0, -1)]  # abstained negatives

Out: array([3])

~~~   ~~~   ~~~

3: metric_score()
Evaluate a standard metric on a set of predictions/probabilities.

Code for metric_score() is in:  target="_blank">metrics.py

Using this you can evaluate a standard metric on a set of predictions (True Labels and Predicted Labels) / probabilities.

Scores available are:
1. _coverage_score
2. _roc_auc_score
3. _f1_score
4. _f1_micro_score
5. _f1_macro_score

It is a wrapper around "sklearn.metrics" and adds to it by giving the above five metrics.

METRICS = {
    "accuracy":  Metric(sklearn.metrics.accuracy_score),
    "coverage":  Metric(_coverage_score, ["preds"]),
    "precision": Metric(sklearn.metrics.precision_score),
    "recall": 	 Metric(sklearn.metrics.recall_score),
    "f1": 		 Metric(_f1_score, ["golds", "preds"]),
    "f1_micro":  Metric(_f1_micro_score, ["golds", "preds"]),
    "f1_macro":  Metric(_f1_macro_score, ["golds", "preds"]),
    "fbeta": 	 Metric(sklearn.metrics.fbeta_score),
	
    "matthews_corrcoef": 
				 Metric(sklearn.metrics.matthews_corrcoef),
				 
    "roc_auc": 	 Metric(_roc_auc_score, ["golds", "probs"]),
}
survival8

Wednesday, September 16, 2020

Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)

No comments:

Post a Comment