Wednesday, September 16, 2020

Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)


Current version of Snorkel is v0.9.6 (as on 16-Sep-2020). Link to GitHub Snorkel has 8 packages. Package Reference: 1. Snorkel Analysis Package 2. Snorkel Augmentation Package 3. Snorkel Classification Package 4. Snorkel Labeling Package 5. Snorkel Map Package 6. Snorkel Preprocess Package 7. Snorkel Slicing Package 8. Snorkel Utils Package What is Snorkel's Analysis Package for? This package dicusses how to interpret classification results. Generic model analysis utilities shared across Snorkel. 1: Scorer Calculate one or more scores from user-specified and/or user-defined metrics. This defines a class 'Scorer' with two methods: 'score()' and 'score_slices()'. You have specify input arguments such as metrics (this is related to the 'metric_score()' discussed below), true labels, predicted labels and predicted probabilities. It is through this that we make use of code in 'metrics.py' Code Snippet:
~~~ ~~~ ~~~ 2: get_label_buckets Return data point indices bucketed by label combinations. This is a function written in the error_analysis.py file. Code: import snorkel import numpy as np from snorkel.analysis import get_label_buckets print("Snorkel version:", snorkel.__version__) Snorkel version: 0.9.3 A common use case is calling ``buckets = label_buckets(Y_gold, Y_pred)`` where ``Y_gold`` is a set of gold (i.e. ground truth) labels and ``Y_pred`` is a corresponding set of predicted labels. Y_gold = np.array([1, 1, 1, 0, 0, 0, 1]) Y_pred = np.array([1, 1, -1, -1, 1, 0, 1]) buckets = get_label_buckets(Y_gold, Y_pred) # If gold and pred have different number of elements >> ValueError: Arrays must all have the same number of elements The returned ``buckets[(i, j)]`` is a NumPy array of data point indices with true label i and predicted label j. More generally, the returned indices within each bucket refer to the order of the labels that were passed in as function arguments. print(buckets[(1, 1)]) # true positives where both are 1 Out: array([0, 1, 6]) buckets[(0, 0)] # true positives where both are 0 Out: array([5]) # false positives, false negatives and true negatives print((1, 0) in buckets, '/', (0, 1) in buckets, '/', (0, 0) in buckets) Out: False / True / True buckets[(1, -1)] # abstained positives Out: array([2]) buckets[(0, -1)] # abstained negatives Out: array([3]) ~~~ ~~~ ~~~ 3: metric_score() Evaluate a standard metric on a set of predictions/probabilities. Code for metric_score() is in: target="_blank">metrics.py Using this you can evaluate a standard metric on a set of predictions (True Labels and Predicted Labels) / probabilities. Scores available are: 1. _coverage_score 2. _roc_auc_score 3. _f1_score 4. _f1_micro_score 5. _f1_macro_score It is a wrapper around "sklearn.metrics" and adds to it by giving the above five metrics. METRICS = { "accuracy": Metric(sklearn.metrics.accuracy_score), "coverage": Metric(_coverage_score, ["preds"]), "precision": Metric(sklearn.metrics.precision_score), "recall": Metric(sklearn.metrics.recall_score), "f1": Metric(_f1_score, ["golds", "preds"]), "f1_micro": Metric(_f1_micro_score, ["golds", "preds"]), "f1_macro": Metric(_f1_macro_score, ["golds", "preds"]), "fbeta": Metric(sklearn.metrics.fbeta_score), "matthews_corrcoef": Metric(sklearn.metrics.matthews_corrcoef), "roc_auc": Metric(_roc_auc_score, ["golds", "probs"]), }

No comments:

Post a Comment