survival8

Monday, September 21, 2020

12 'ECMAScript 6' Tips (Sep 2020)


1. 
  clear()
  This function clears the 'output pane' in 'JS console'.

2. 
  You can define an array as follows:

  var arr = [
    'elem a',
    'elem b',
    'elem c'
  ] 

  'Shift + Enter' lets you add a line-feed in the code in Firefox JS Console.

3. 
  "arr.length" 
  Will return you '3'.

4.
  Get unique elements from an Array:
  With ES6:
  var a = ['a', 'b', 'c', 'c']
  console.log([...new Set(a)]) 
  
  Out:
   ["a", "b", "c"]
   
  OR
   Array.from(new Set(a)); 
  
  OR
   list = list.filter((x, i, a) => a.indexOf(x) === i) 
   
5.
  To concatenate two arrays:
  a = ['a', 'b', 'c', 'c']
  b = ['d', 'e']
  console.log(a.concat(b)) 
  
  Out:
   ["a", "b", "c", "c", "d", "e"]
  
6. 
  ES6 based counter:
    var cntr = 0;
    function myFunction() {
      console.log(arr[cntr]);
      cntr++;
    } 

7.
  Add an element to "DOM body" before other child elements.
    b = document.querySelector("body")
    theKid = document.createElement("p");
    theKid.innerHTML = '<button onclick="myFunction(chatroom_names_kenya)" style="position: sticky !important;top: 0;z-index: 999;">Submit</button>';
    b.insertBefore(theKid, b.firstChild) 

8.
  Ways to access an element of HTML DOM using JavaScript:
  
 Gets           Selector Syntax Method 
 ID                #demo           getElementById()
 Class             .demo           getElementsByClassName()
 Tag               demo            getElementsByTagName()
 Selector (single)                 querySelector()
 Selector (all)                    querySelectorAll()

 About Query Selectors: It is JavaScript's method of accessing the DOM with CSS selectors.

9.
  Finding HTML Elements by HTML Object Collections:
    var x = document.forms["frm1"];
    var text = "";
    var i;
    for (i = 0; i < x.length; i++) {
      text += x.elements[i].value + "<br>";
    }
    document.getElementById("demo").innerHTML = text;  

  The following HTML objects (and object collections) are also accessible:
  1. document.anchors
  2. document.body
  3. document.documentElement
  4. document.embeds
  5. document.forms
  6. document.head
  7. document.images
  8. document.links
  9. document.scripts
  10. document.title

10.
  Identifying type of a variable:
  The typeof operator returns the type of a variable, object, function or expression:
  
    typeof "John"                 // Returns string
    typeof 3.14                   // Returns number
    typeof NaN                    // Returns number
    typeof false                  // Returns boolean
    typeof [1, 2, 3, 4]           // Returns object
    typeof {name:'John', age:34}  // Returns object
    typeof new Date()             // Returns object
    typeof function () {}         // Returns function
    typeof myCar                  // Returns undefined (if myCar is not declared)
    typeof null                   // Returns object  

  Please observe:
    The data type of NaN is number
    The data type of an array is object
    The data type of a date is object
    The data type of null is object
    The data type of an undefined variable is undefined

11.
  The instanceof Operator
  The instanceof operator returns true if the specified object is an instance of the specified object:

    var cars = ["Saab", "Volvo", "BMW"];
    
    cars instanceof Array;          // Returns true
    cars instanceof Object;         // Returns true
    cars instanceof String;         // Returns false
    cars instanceof Number;         // Returns false 

12.
  Spread Operator
  
  Spread operator allows an iterable to expand in places where 0+ arguments are expected. It is mostly used in the variable array where there is more than 1 values are expected. It allows us the privilege to obtain a list of parameters from an array. Syntax of Spread operator is same as "Rest parameter" but it works completely opposite of it. 

  12.1    
    // spread operator doing the concat job 
    let arr = [1,2,3]; 
    let arr2 = [4,5]; 
      
    arr = [...arr,...arr2]; 
    console.log(arr); // [ 1, 2, 3, 4, 5 ]  

  12.2 
    // spread operator for copying  
    let arr = ['a','b','c']; 
    let arr2 = [...arr]; 
      
    console.log(arr); // [ 'a', 'b', 'c' ] 
      
    arr2.push('d'); //inserting an element at the end of arr2 
      
    console.log(arr2); // [ 'a', 'b', 'c', 'd' ] 
    console.log(arr); // [ 'a', 'b', 'c' ]

Sunday, September 20, 2020

Failing to launch Android Virtual Device on Ubuntu in VirtualBox on Windows 10

'Android Virtual Device' fails to run on Ubuntu in VirtualBox on Windows 10:
Your CPU does not support required features (VT-x or SVM).



Message you get on clicking 'Troubleshoot':



If your computer does not support hardware accelerated virtualization. Android Studio provides suggestions:

1. Use a physical device for testing.

2. Develop on a Windows/OSX computer with an Intel processor that supports VT-x and NX.

3. Develop on a Linux computer that supports VT-x or SVM.

4. Use an Android Virtual Device based on an ARM system image. (This is 10X slower than hardware accelerated virtualization.)

Our VirtualBox GuestOS setting:



How to enable the "nested vtx/amd-v" in Orable VirtualBox for Windows? 

In Windows, go to VirtualBox installation folders 
-> type 'cmd' in the 'address' bar (it will pop up 'cmd' in that folder) 
-> type VBoxManage modifyvm YourVirtualBoxName --nested-hw-virt on 
-> enter.



Now it should be ticked.

Warning we still get:



Performance Issues

1.


2. AVD never starts beyond this:

Web Security - Prevent Website from opening in an IFrame

For Web Security, prevent website from opening in an IFrame as 'WhatsApp Web' does.

<div>
    <p>WhatsApp Error: Prevention from opening WhatsApp Web in an IFrame.</p>
</div>
<iframe src="https://web.whatsapp.com/" title="My WhatsApp" width=900 height=400></iframe> 

View in Mozilla Firefox:

Setting up Ubuntu 20.04 for Flutter based Android app development

1.
Install Git.

$ sudo apt install git

2.
Create a directory where we download the 'flutter':

(base) ashish@ashish-VirtualBox:~/Desktop/ws/programfiles/flutter$ pwd
/home/ashish/Desktop/ws/programfiles/flutter_box

3.
Download 'flutter':

$ pwd
/home/ashish/Desktop/ws/programfiles/flutter_box

$ git clone https://github.com/flutter/flutter.git

4.
Add the flutter tool to your path:

$export PATH="$PATH:`pwd`/flutter/bin"

Update this in "~/.bashrc" file.

$ nano ~/.bashrc
$ source ~/.bashrc

5.
Optionally, pre-download development binaries:

The flutter tool downloads platform-specific development binaries as needed. For scenarios where pre-downloading these artifacts is preferable (for example, in hermetic build environments, or with intermittent network availability), iOS and Android binaries can be downloaded ahead of time by running:

$ flutter precache

6.
Install "Android SDK" from 'Terminal'.

$ sudo apt update && sudo apt install android-sdk

7.
Install "Android Studio" from "Ubuntu Software".

8.
When you launch 'Android Studio' for the first time, it gives the prompt for 'Import Android Studio Settings':

Set it to "Do not import 'Settings'."

9.
It will next launch the 'Android Studio Setup Wizard'.

10.
Default JDK location:

11.
Next, it downloads SDK components:

12.
Prompt for 'Emulator Settings for Hardware Acceleration'

13.
Undate Android license status.
Run `flutter doctor --android-licenses` to accept the SDK licenses.
See https://flutter.dev/docs/get-started/install/linux#android-setup for more details.

$ flutter doctor --android-licenses

14.
Launch "Settings" as shown below. Then go to "Plugins".

If we launch installation of 'Flutter' plugin, it automatically prompts for the installation for 'Dart'.

Then, give 'Android Studio' a restart.

15.
Installing 'Flutter Extension' in Visual Studio Code.

Go to 'Extensions' as shown below and search for 'flutter'.

...

16.
Test installation:

(base) ashish@ashish-VirtualBox:~/.../flutter_box$ flutter doctor
Doctor summary (to see all details, run flutter doctor -v):
[✓] Flutter (Channel master, 1.22.0-10.0.pre.264, on Linux, locale en_IN)

[✓] Android toolchain - develop for Android devices (Android SDK version 30.0.2)
[✓] Android Studio (version 4.0)
[✓] VS Code (version 1.49.1)
[!] Connected device
! No devices available

! Doctor found issues in 1 category.

17.
Common Issues that we notice from 'flutter doctor':

As of Flutter’s 1.19.0 dev release, the Flutter SDK contains the dart command alongside the flutter command so that you can more easily run Dart command-line programs. Downloading the Flutter SDK also downloads the compatible version of Dart, but if you’ve downloaded the Dart SDK separately, make sure that the Flutter version of dart is first in your path, as the two versions might not be compatible.

$ flutter doctor
Doctor summary (to see all details, run flutter doctor -v):

17.1.
[!] Android toolchain - develop for Android devices (Android SDK version 27.0.1)
✗ Flutter requires Android SDK 29 and the Android BuildTools 28.0.3
To update the Android SDK visit Flutter.dev: Android Setup on Linux for detailed instructions.

17.2.
✗ Android license status unknown.
Run `flutter doctor --android-licenses` to accept the SDK licenses.
See Flutter.dev: Android Setup on Linux for more details.

17.3.
✗ Android licenses not accepted. To resolve this, run: flutter doctor --android-licenses

17.4.
[!] Android Studio (not installed)

17.5.
[!] Android Studio (version 4.0)
✗ Flutter plugin not installed; this adds Flutter specific functionality.

17.6
[!] Android Studio (version 4.0)
✗ Dart plugin not installed; this adds Dart specific functionality.

17.7.
[!] VS Code (version 1.49.1)
✗ Flutter extension not installed; install from
https://marketplace.visualstudio.com/items?itemName=Dart-Code.flutter

17.8.
[!] Connected device
! No devices available

! Doctor found issues in 4 categories.

Dated: Sep 2020
Ref: https://flutter.dev/docs/get-started/install/linux

Thursday, September 17, 2020

Binomial Probability Distribution (visualization using Seaborn)

Binomial Probability Distribution 



"pmf" is "Probability Mass Function" or "Probability Distribution".

"rv" is "Random Variable".



Note: Binomial Distribution is a Discrete Distribution.

Visualization of Binomial Distribution 



Difference Between Normal and Binomial Distribution 

The main difference is that normal distribution is continous whereas binomial is discrete, but if there are enough data points it will be quite similar to normal distribution with certain loc and scale.

We have code that produces overlapped "Normal" and "Binomial" distributions. We will show some of the best and some of the worst overlaps.

number_of_trials = 150

for s in range(1000, 100000000, 1000000): 
    print("size:", s)
    sns.distplot(random.binomial(n = number_of_trials, p=0.5, size=s), hist=False, label='binomial')

    sns.distplot(random.normal(loc = number_of_trials / 2, 
                               scale=5, size=s), hist=False, label='normal')

    plt.show() 
	
Best Overlaps



Worst Overlaps



References 
% numpy.org

Improving a Classifier (ML) Using Snorkel's Slicing Technique

The dataset we are using is the '150 datapoints strong' Iris flower species dataset (Download from here).

We have a dependency here to draw the confusion matrix. The code file name is: DrawConfusionMatrix.py

Content:

# Ref: Scikit-Learn 

import itertools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize = False,
                          title = 'Confusion matrix',
                          cmap = plt.cm.Blues,
                          use_seaborn = False):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)
    
    if use_seaborn == False:
        plt.imshow(cm, interpolation='nearest', cmap=cmap)
        plt.colorbar()
        
        fmt = '.2f' if normalize else 'd'
        thresh = cm.max() / 2.
        for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
            plt.text(j, i, format(cm[i, j], fmt),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
            
        tick_marks = np.arange(len(classes) + 0)
    
    else:
        
        ax = sns.heatmap(cm, annot=True, fmt='d') #notation: "annot" not "annote"
        # fmt='d': print values as decimals
        
        bottom, top = ax.get_ylim()
        ax.set_ylim(bottom + 0.5, top - 0.5)
        tick_marks = np.arange(len(classes) + 1)

    plt.title(title)
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    plt.ylabel('True label')
    plt.xlabel('Predicted label') 
    
Now, the main problem: 

# Import libraries.

import DrawConfusionMatrix as dcm
import importlib # The imp module was deprecated in Python 3.4 in favor of the importlib module.
importlib.reload(dcm)

import pandas as pd
import numpy as np
from collections import Counter

from snorkel.augmentation import transformation_function
from snorkel.augmentation import RandomPolicy
from snorkel.augmentation import PandasTFApplier

from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix


df = pd.read_csv('datasets_19_420_Iris.csv') 

for i in set(df.Species):
    # ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']
    print(i)
    print(df[df.Species == i].describe().loc[['mean', 'std'], :], '\n') 
	
Iris-versicolor
            Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  75.50000       5.936000      2.770000       4.260000      1.326000
std   14.57738       0.516171      0.313798       0.469911      0.197753 

Iris-virginica
             Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  125.50000        6.58800      2.974000       5.552000       2.02600
std    14.57738        0.63588      0.322497       0.551895       0.27465 

Iris-setosa
            Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  25.50000        5.00600      3.418000       1.464000       0.24400
std   14.57738        0.35249      0.381024       0.173511       0.10721  

 
features = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']

classes = ['Iris-setosa', 'Iris-virginica', 'Iris-versicolor']
desc_dict = {}
for i in classes:
    desc_dict[i] = df[df.Species == i].describe()
	
df['Train'] = 'Train'

# random.randint returns a random integer N such that a <= N <= b

@transformation_function(pre = [])
def get_new_instance_for_this_class(x):
    x.SepalLengthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['SepalLengthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['SepalLengthCm']].iloc[0,0], 2) * 100) / 100
    
    x.SepalWidthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['SepalWidthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['SepalWidthCm']].iloc[0,0], 2) * 100) / 100
    
    x.PetalLengthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['PetalLengthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['PetalLengthCm']].iloc[0,0], 2) * 100) / 100
    
    x.PetalWidthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['PetalWidthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['PetalWidthCm']].iloc[0,0], 2) * 100) / 100
    
    x.Train = 'Test'
    return x

tfs = [ get_new_instance_for_this_class ]

random_policy = RandomPolicy(
    len(tfs), sequence_length=2, n_per_original=5, keep_original=True
    # n_per_original (int) – Number of transformed data points per original
)

tf_applier = PandasTFApplier(tfs, random_policy)
df_train_augmented = tf_applier.apply(df)

print(f"Original training set size: {len(df)}")
print(f"Augmented training set size: {len(df_train_augmented)}") 

Original training set size: 150
Augmented training set size: 900 

df_test = df_train_augmented[df_train_augmented.Train == 'Test']

pred = clf.predict(df_test[features])

pred_probs = clf.predict_proba(df_test[features])
# Make Note Of >> AttributeError: predict_proba is not available when 'probability=False'

print(Counter(pred))
print("Accuracy: {:.3f}".format(accuracy_score(df_test['Species'], pred)))

cm = confusion_matrix(df_test['Species'], pred)
print("Confusion matrix:\n{}".format(cm))

Counter({'Iris-versicolor': 252, 'Iris-setosa': 250, 'Iris-virginica': 248})
Accuracy: 0.968
Confusion matrix:
[[250   0   0]
 [  0 239  11]
 [  0  13 237]] 

classes = ['setosa', 'versicolor', 'virginica']

dcm.plot_confusion_matrix(cm, classes = classes, use_seaborn = True) 

# This plot is for 'Support Vector Machine' based classifier.



# This plot is for 'Random Forest' based classifier.



Here we see that there are some misclassified data points for classes 'Versicolor' and 'Verginica'.
'Setosa' has not been misclassified by either SVM or RandomForest.

Next, we would slice the dataframe into 'setosa' and 'not setosa' dataframes. Because we are not having issues with 'setosa' data points, we would re-train a classifier on the other two classes viz. 'versicolor' and 'virginica'.

import re
from snorkel.slicing import slicing_function

@slicing_function()
def not_setosa(x):
    return x.Species != 'Iris-setosa'

sfs = [not_setosa]

# ~ ~ ~

#Store slice metadata in S
from snorkel.slicing import PandasSFApplier

applier = PandasSFApplier(sfs)
S_test = applier.apply(df_test)

# ~ ~ ~

from snorkel.analysis import Scorer

scorer = Scorer(metrics=["f1_micro", "f1_macro"])
# Make Note Of >> ValueError: f1 not supported for multiclass. 
# Try f1_micro or f1_macro instead.

# ~ ~ ~

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df_test['Species'])

scorer.score_slices(
    S=S_test, 
    golds=le.transform(df_test['Species']), 
    preds=le.transform(pred), 
    probs=pred_probs, 
    as_dataframe=True
) 



from snorkel.slicing import slice_dataframe

df_not_setosa = slice_dataframe(df_train_augmented, not_setosa)

from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(max_depth=4, random_state=0, n_estimators = 100) 

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
	max_depth=4, max_features='auto', max_leaf_nodes=None,
	min_impurity_decrease=0.0, min_impurity_split=None,
	min_samples_leaf=1, min_samples_split=2,
	min_weight_fraction_leaf=0.0, n_estimators=100,
	n_jobs=None, oob_score=False, random_state=0, verbose=0,
	warm_start=False) 
	
df_test_rfc = df_not_setosa[df_not_setosa.Train == 'Test']
pred_rfc = rfc.predict(df_test_rfc[features])
print(Counter(pred_rfc))
print("Accuracy: {:.3f}".format(accuracy_score(df_test_rfc['Species'], pred_rfc)))

cm = confusion_matrix(df_test_rfc['Species'], pred_rfc)
print("Confusion matrix:\n{}".format(cm))

Counter({'Iris-versicolor': 251, 'Iris-virginica': 249})
Accuracy: 0.990 
Confusion matrix:
[[248   2]
 [  3 247]] 
 
 dcm.plot_confusion_matrix(cm, 
    classes = ['versicolor', 'virginica'], 
    use_seaborn = True) 

Using RandomForestClassifier on sliced dataset:


We also have the score for SVC, it is not as good as RandomForestClassifier:

svc = svm.SVC(gamma = 'auto', probability=True)
svc.fit(df_not_setosa[features], df_not_setosa['Species']) 

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
    max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001,
    verbose=False) 
	
pred_svc = svc.predict(df_test_rfc[features])
print(Counter(pred_svc))
print("Accuracy: {:.3f}".format(accuracy_score(df_test_rfc['Species'], pred_svc)))

cm = confusion_matrix(df_test_rfc['Species'], pred_svc)
print("Confusion matrix:\n{}".format(cm)) 

Counter({'Iris-versicolor': 251, 'Iris-virginica': 249})
Accuracy: 0.986 
Confusion matrix:
[[247   3]
 [  4 246]] 
 
Reference 
% Slice-based Learning: a Programming Model for Residual Learning in Critical Data Slices

Wednesday, September 16, 2020

Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)



Current version of Snorkel is v0.9.6 (as on 16-Sep-2020). Link to GitHub

Snorkel has 8 packages.

Package Reference:

1. Snorkel Analysis Package
2. Snorkel Augmentation Package
3. Snorkel Classification Package
4. Snorkel Labeling Package
5. Snorkel Map Package
6. Snorkel Preprocess Package
7. Snorkel Slicing Package
8. Snorkel Utils Package

What is Snorkel's Analysis Package for? 
This package dicusses how to interpret classification results. 

Generic model analysis utilities shared across Snorkel.

1: Scorer
Calculate one or more scores from user-specified and/or user-defined metrics.

This defines a class 'Scorer' with two methods: 'score()' and 'score_slices()'. You have specify input arguments such as metrics (this is related to the 'metric_score()' discussed below), true labels, predicted labels and predicted probabilities.

It is through this that we make use of code in 'metrics.py'

Code Snippet:


~~~   ~~~   ~~~

2: get_label_buckets
Return data point indices bucketed by label combinations.

This is a function written in the error_analysis.py file.
  
Code:
import snorkel
import numpy as np
from snorkel.analysis import get_label_buckets
print("Snorkel version:", snorkel.__version__) 

Snorkel version: 0.9.3

A common use case is calling ``buckets = label_buckets(Y_gold, Y_pred)`` where ``Y_gold`` is a set of gold (i.e. ground truth) labels and ``Y_pred`` is a corresponding set of predicted labels.
 
Y_gold = np.array([1, 1, 1, 0, 0, 0, 1])
Y_pred = np.array([1, 1, -1, -1, 1, 0, 1])

buckets = get_label_buckets(Y_gold, Y_pred) 
# If gold and pred have different number of elements >> ValueError: Arrays must all have the same number of elements

The returned ``buckets[(i, j)]`` is a NumPy array of data point indices with true label i and predicted label j. More generally, the returned indices within each bucket refer to the order of the labels that were passed in as function arguments.

print(buckets[(1, 1)])  # true positives where both are 1

Out: array([0, 1, 6])

buckets[(0, 0)]  # true positives where both are 0

Out: array([5])

# false positives, false negatives and true negatives
print((1, 0) in buckets, '/', (0, 1) in buckets, '/', (0, 0) in buckets)  

Out: False / True / True

buckets[(1, -1)]  # abstained positives

Out: array([2])

buckets[(0, -1)]  # abstained negatives

Out: array([3])

~~~   ~~~   ~~~

3: metric_score()
Evaluate a standard metric on a set of predictions/probabilities.

Code for metric_score() is in:  target="_blank">metrics.py

Using this you can evaluate a standard metric on a set of predictions (True Labels and Predicted Labels) / probabilities.

Scores available are:
1. _coverage_score
2. _roc_auc_score
3. _f1_score
4. _f1_micro_score
5. _f1_macro_score

It is a wrapper around "sklearn.metrics" and adds to it by giving the above five metrics.

METRICS = {
    "accuracy":  Metric(sklearn.metrics.accuracy_score),
    "coverage":  Metric(_coverage_score, ["preds"]),
    "precision": Metric(sklearn.metrics.precision_score),
    "recall": 	 Metric(sklearn.metrics.recall_score),
    "f1": 		 Metric(_f1_score, ["golds", "preds"]),
    "f1_micro":  Metric(_f1_micro_score, ["golds", "preds"]),
    "f1_macro":  Metric(_f1_macro_score, ["golds", "preds"]),
    "fbeta": 	 Metric(sklearn.metrics.fbeta_score),
	
    "matthews_corrcoef": 
				 Metric(sklearn.metrics.matthews_corrcoef),
				 
    "roc_auc": 	 Metric(_roc_auc_score, ["golds", "probs"]),
}

Monday, September 14, 2020

Starting With Selenium's Python Package (Installation)


  
We have a YAML file to setup our conda environment. The file 'selenium.yml' has contents:

name: selenium
channels:
  - conda-forge
  - defaults
dependencies:
  - selenium
  - jupyterlab
  - ipykernel 

To setup the environment, we run the command:

(base) CMD> conda env create -f selenium.yml 

(selenium) CMD> conda activate selenium 

After that, if we want to see which all packages got installed, we run the command:

(selenium) CMD> conda env export 

Next, we setup a kernel from this environment:

(selenium) CMD> python -m ipykernel install --user --name selenium 
Installed kernelspec selenium in C:\Users\Ashish Jain\AppData\Roaming\jupyter\kernels\selenium 

To view the list of kernels:

(selenium) CMD> jupyter kernelspec list 
Available kernels:
  selenium              C:\Users\Ashish Jain\AppData\Roaming\jupyter\kernels\selenium
  python3               E:\programfiles\Anaconda3\envs\selenium\share\jupyter\kernels\python3 
  ... 
  
A basic piece of code would start the browser. We have tried and tested it for Chrome and Firefox. To do this, we need the web driver file or we get the following exception:

CODE:

from selenium import webdriver  
import time  
from selenium.webdriver.common.keys import Keys  

driver = webdriver.Chrome()  

ERROR:

----------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
E:\programfiles\Anaconda3\envs\selenium\lib\site-packages\selenium\webdriver\common\service.py in start(self)
     71             cmd.extend(self.command_line_args())
---> 72             self.process = subprocess.Popen(cmd, env=self.env,
     73                                             close_fds=platform.system() != 'Windows',

E:\programfiles\Anaconda3\envs\selenium\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
    853 
--> 854             self._execute_child(args, executable, preexec_fn, close_fds,
    855                                 pass_fds, cwd, env,

E:\programfiles\Anaconda3\envs\selenium\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session)
   1306             try:
-> 1307                 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
   1308                                          # no special security

FileNotFoundError: [WinError 2] The system cannot find the file specified

During handling of the above exception, another exception occurred:

WebDriverException                        Traceback (most recent call last)
...
WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home 

We got the file from here: chromedriver.storage.googleapis.com For v86 

chromedriver_win32.zip ---> chromedriver.exe

Error for WebDriver and Browser version mismatch:

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 86
Current browser version is 85.0.4183.102 with binary path C:\Program Files (x86)\Google\Chrome\Application\chrome.exe 

Download from here for Chrome v85: chromedriver.storage.googleapis.com For v85 

One point to note about ChromeDriver as in September 2020:

ChromeDriver only supports characters in the BMP (Basic Multilingual Plane) is a known issue with Chromium team as ChromeDriver still doesn't support characters with a Unicode after FFFF. Hence it is impossible to send any character beyond FFFF via ChromeDriver. As a result any attempt to send SMP (Supplementary Multilingual Plane) characters (e.g. CJK, Emojis, Symbols, etc) raises the error. 

While Firefox supports Emoji's sent via 'send_keys()' method. 

As of Unicode 13.0, the SMP comprises the following 134 blocks: Archaic Greek and Other Left-to-right scripts: Linear B Syllabary (10000–1007F) Linear B Ideograms (10080–100FF). 

~ ~ ~ ~ ~

If you working with Firefox browser, you need the Gecko WebDriver available at the Windows 'PATH' variable.

Without WebDriver file:
	FileNotFoundError: [WinError 2] The system cannot find the file specified
	WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 

Download Gecko driver from here: GitHub Repo of Mozilla 

The statement to launch the web browser will be: 

driver = webdriver.Firefox()  

By default, browsers open in a partial size window. To maximize the window: 

driver.maximize_window() 

Now, we open a link: driver.get("http://survival8.blogspot.com/")

Wednesday, September 9, 2020

Sentiment Analysis using BERT, DistilBERT and ALBERT

We will do Sentiment Analysis using the code from this repo: GitHub

Check out the code from above repository to get started.

For creating Conda environment, we have a file "sentiment_analysis.yml" with content:

name: e20200909
channels:
  - defaults
  - conda-forge
  - pytorch
  
dependencies:
  - pytorch
  - pandas
  - numpy
  - pip:
    - transformers==3.0.1
  - flask
  - flask_cors
  - scikit-learn
  - ipykernel 

(base) C:\>conda env create -f sentiment_analysis.yml

It will install the above mentioned dependencies and the nested dependencies.

(base) C:\Users\Ashish Jain>conda env list 
# conda environments:
#
base                  *  E:\programfiles\Anaconda3
e20200909                E:\programfiles\Anaconda3\envs\e20200909
env_py_36                E:\programfiles\Anaconda3\envs\env_py_36
temp                     E:\programfiles\Anaconda3\envs\temp
temp202009               E:\programfiles\Anaconda3\envs\temp202009
tf                       E:\programfiles\Anaconda3\envs\tf 

(base) C:\Users\Ashish Jain>conda activate e20200909 

(e20200909) C:\Users\Ashish Jain>conda env export
name: e20200909
channels:
  - conda-forge
  - defaults
dependencies:
  - _pytorch_select=0.1=cpu_0
  - backcall=0.2.0=py_0
  - blas=1.0=mkl
  - ca-certificates=2020.7.22=0
  - certifi=2020.6.20=py38_0
  - cffi=1.14.2=py38h7a1dbc1_0
  - click=7.1.2=py_0
  - colorama=0.4.3=py_0
  - decorator=4.4.2=py_0
  - flask=1.1.2=py_0
  - flask_cors=3.0.9=pyh9f0ad1d_0
  - icc_rt=2019.0.0=h0cc432a_1
  - intel-openmp=2019.4=245
  - ipykernel=5.3.4=py38h5ca1d4c_0
  - ipython=7.18.1=py38h5ca1d4c_0
  - ipython_genutils=0.2.0=py38_0
  - itsdangerous=1.1.0=py_0
  - jedi=0.17.2=py38_0
  - jinja2=2.11.2=py_0
  - joblib=0.16.0=py_0
  - jupyter_client=6.1.6=py_0
  - jupyter_core=4.6.3=py38_0
  - libmklml=2019.0.5=0
  - libsodium=1.0.18=h62dcd97_0
  - markupsafe=1.1.1=py38he774522_0
  - mkl=2019.4=245
  - mkl-service=2.3.0=py38hb782905_0
  - mkl_fft=1.1.0=py38h45dec08_0
  - mkl_random=1.1.0=py38hf9181ef_0
  - ninja=1.10.1=py38h7ef1ec2_0
  - numpy=1.19.1=py38h5510c5b_0
  - numpy-base=1.19.1=py38ha3acd2a_0
  - openssl=1.1.1g=he774522_1
  - pandas=1.1.1=py38ha925a31_0
  - parso=0.7.0=py_0
  - pickleshare=0.7.5=py38_1000
  - pip=20.2.2=py38_0
  - prompt-toolkit=3.0.7=py_0
  - pycparser=2.20=py_2
  - pygments=2.6.1=py_0
  - python=3.8.5=h5fd99cc_1
  - python-dateutil=2.8.1=py_0
  - pytorch=1.6.0=cpu_py38h538a6d7_0
  - pytz=2020.1=py_0
  - pywin32=227=py38he774522_1
  - pyzmq=19.0.1=py38ha925a31_1
  - scikit-learn=0.23.2=py38h47e9c7a_0
  - scipy=1.5.0=py38h9439919_0
  - setuptools=49.6.0=py38_0
  - six=1.15.0=py_0
  - sqlite=3.33.0=h2a8f88b_0
  - threadpoolctl=2.1.0=pyh5ca1d4c_0
  - tornado=6.0.4=py38he774522_1
  - traitlets=4.3.3=py38_0
  - vc=14.1=h0510ff6_4
  - vs2015_runtime=14.16.27012=hf0eaf9b_3
  - wcwidth=0.2.5=py_0
  - werkzeug=1.0.1=py_0
  - wheel=0.35.1=py_0
  - wincertstore=0.2=py38_0
  - zeromq=4.3.2=ha925a31_2
  - zlib=1.2.11=h62dcd97_4
  - pip:
    - chardet==3.0.4
    - filelock==3.0.12
    - idna==2.10
    - packaging==20.4
    - pyparsing==2.4.7
    - regex==2020.7.14
    - requests==2.24.0
    - sacremoses==0.0.43
    - sentencepiece==0.1.91
    - tokenizers==0.8.0rc4
    - tqdm==4.48.2
    - transformers==3.0.1
    - urllib3==1.25.10
prefix: E:\programfiles\Anaconda3\envs\e20200909

(e20200909) C:\Users\Ashish Jain> 

Next, we run the 'analyser' code:

(e20200909) C:\SentimentAnalysis-master>python analyze.py 
Please wait while the analyser is being prepared.
Input sentiment to analyze: I am feeling good.
Positive with probability 99%.
Input sentiment to analyze: I am feeling bad.
Negative with probability 99%.
Input sentiment to analyze: I am Ashish.
Positive with probability 81%.
Input sentiment to analyze: 

Next, we run it in browser:

We pass the same sentences as above.

Here are server logs:

(e20200909) C:\SentimentAnalysis-master>python server.py 
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [09/Sep/2020 21:35:48] "GET / HTTP/1.1" 400 -
127.0.0.1 - - [09/Sep/2020 21:35:48] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [09/Sep/2020 21:36:02] "GET /?text=hello HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:38] "GET /?text=shut%20up HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:50] "GET /?text=i%20am%20feeling%20good HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:54] "GET /?text=i%20am%20feeling%20bad HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:37:00] "GET /?text=i%20am%20ashish HTTP/1.1" 200 - 

The browser screens:

Tuesday, September 8, 2020

2 X 2 Idempotent matrix

I had to provide an example of an idempotent matrix. That's the kind of matrix that yields itself when multiplied to itself. Much like 0 and 1 in scalar multiplication (1 x 1 = 1).

It is not so easy to predict the result of a matrix multiplication, especially for large matrices. So, instead of settling with the naïve method of guessing with trial and error, I explored the properties of a square matrix of the order 2.

In this page I state the question and begin to attempt it. I realised that for a matrix to be idempotent, it would have to retain its dimensions (order), and hence be a square matrix.

I have intentionally put distinct variable names a,b,c, and d. This is to ensure that the possibility of a different number at each index is open. I derived 'bc' from the first equation and substituted it into its instance in the last equation to obtain a solution for 'a'.

Since 0 cannot be divided by 0, I could not divide 0 by either term unless it was a non-zero term. Thus, I had two possibilities, to which I called case A and B.

I solved the four equations in case A by making substitutions into the 4 main equations. Later tested the solution with b=1.

As you can see, I could not use the elimination method in an advantageous manner for this case.

I couldn't get a unique solution in either case. That is because there are many possible square matrices that are idempotent. However, I don't feel comfortable to intuit that every 2 X 2 idempotent matrix has one of only two possible numbers as its first and last elements.

Others’ take on it

My classmate Sabari Sreekumar did manage to use elimination for the ‘bc’ term for the general case.

I took it a step further and defined the last element in terms of the other elements

So given any 2 X 2 idempotent matrix and its first three elements, you can find the last element unequivocally with this formula.

Conclusion

I wonder if multiples of matrices that satisfy either case are also idempotent. Perhaps I will see if I can prove that in another post.

In the next lecture, professor Venkata Ratnam suggested using the sure-shot approach of a zero matrix. And I was like “Why didn’t I think of that”?

Sunday, September 6, 2020

Setting up Conda Environment for Swagger and Scrapy based project



We have a file that reads "my_yml.yml":

name: swagger2
channels:
  - conda-forge
  - defaults
dependencies:
  - beautifulsoup4
  - connexion
  - flask
  - flask_cors
  - scrapy 

It will do these three things:

1. It will create an environment "swagger2".

2. For downloading packages, it will use the channels: "conda-forge" and "defaults"

3. The packages it will install are mentioned as "dependencies".

Checking our current environments:
(base) C:\Users\Ashish Jain>conda env list 
# conda environments:
base      *  E:\programfiles\Anaconda3
env_py_36    E:\programfiles\Anaconda3\envs\env_py_36
tf           E:\programfiles\Anaconda3\envs\tf 

(base) C:\experiment_with_conda>conda env create -f my_yml.yml 

Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages
pysocks-1.7.1        | 27 KB     | ### | 100%
flask_cors-3.0.9     | 15 KB     | ### | 100%
chardet-3.0.4        | 189 KB    | ### | 100%
clickclick-1.2.2     | 9 KB      | ### | 100%
cssselect-1.1.0      | 18 KB     | ### | 100%
importlib-metadata-1 | 45 KB     | ### | 100%
attrs-20.2.0         | 41 KB     | ### | 100%
protego-0.1.16       | 2.6 MB    | ### | 100%
twisted-20.3.0       | 5.1 MB    | ### | 100%
pywin32-227          | 6.9 MB    | ### | 100%
pyrsistent-0.16.0    | 91 KB     | ### | 100%
beautifulsoup4-4.9.1 | 86 KB     | ### | 100%
connexion-2.7.0      | 51 KB     | ### | 100%
pyhamcrest-2.0.2     | 29 KB     | ### | 100%
libxslt-1.1.33       | 499 KB    | ### | 100%
libxml2-2.9.10       | 3.5 MB    | ### | 100%
incremental-17.5.0   | 14 KB     | ### | 100%
flask-1.1.2          | 70 KB     | ### | 100%
scrapy-2.3.0         | 640 KB    | ### | 100%
automat-20.2.0       | 30 KB     | ### | 100%
python-3.8.5         | 18.9 MB   | ### | 100%
bcrypt-3.2.0         | 41 KB     | ### | 100%
service_identity-18. | 12 KB     | ### | 100%
win_inet_pton-1.1.0  | 7 KB      | ### | 100%
cryptography-3.1     | 587 KB    | ### | 100%
libiconv-1.16        | 680 KB    | ### | 100%
jmespath-0.10.0      | 21 KB     | ### | 100%
markupsafe-1.1.1     | 29 KB     | ### | 100%
parsel-1.6.0         | 15 KB     | ### | 100%
constantly-15.1.0    | 9 KB      | ### | 100%
pydispatcher-2.0.5   | 12 KB     | ### | 100%
zope.interface-5.1.0 | 299 KB    | ### | 100%
pyasn1-modules-0.2.7 | 60 KB     | ### | 100%
hyperlink-20.0.1     | 42 KB     | ### | 100%
inflection-0.5.1     | 9 KB      | ### | 100%
pyasn1-0.4.8         | 53 KB     | ### | 100%
w3lib-1.22.0         | 21 KB     | ### | 100%
pathlib2-2.3.5       | 34 KB     | ### | 100%
jinja2-2.11.2        | 93 KB     | ### | 100%
setuptools-49.6.0    | 968 KB    | ### | 100%
queuelib-1.5.0       | 13 KB     | ### | 100%
itemloaders-1.0.2    | 14 KB     | ### | 100%
pyyaml-5.3.1         | 158 KB    | ### | 100%
soupsieve-2.0.1      | 30 KB     | ### | 100%
brotlipy-0.7.0       | 368 KB    | ### | 100%
wincertstore-0.2     | 13 KB     | ### | 100%
lxml-4.5.2           | 1.1 MB    | ### | 100%
cffi-1.14.1          | 227 KB    | ### | 100%
itsdangerous-1.1.0   | 16 KB     | ### | 100%
click-7.1.2          | 64 KB     | ### | 100%
certifi-2020.6.20    | 151 KB    | ### | 100%
python_abi-3.8       | 4 KB      | ### | 100%
zlib-1.2.11          | 126 KB    | ### | 100%
openapi-spec-validat | 23 KB     | ### | 100%
jsonschema-3.2.0     | 108 KB    | ### | 100%
itemadapter-0.1.0    | 10 KB     | ### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate swagger2
#
# To deactivate an active environment, use
#
#     $ conda deactivate 

(base) C:\experiment_with_conda>conda activate swagger2 

(swagger2) C:\experiment_with_conda>conda env export 

name: swagger2
channels:
  - conda-forge
  - defaults
dependencies:
  - attrs=20.2.0=pyh9f0ad1d_0
  - automat=20.2.0=py_0
  - bcrypt=3.2.0=py38h1e8a9f7_0
  - beautifulsoup4=4.9.1=py_1
  - brotlipy=0.7.0=py38h1e8a9f7_1000
  - ca-certificates=2020.6.20=hecda079_0
  - certifi=2020.6.20=py38h32f6830_0
  - cffi=1.14.1=py38hba49e27_0
  - chardet=3.0.4=py38h32f6830_1006
  - click=7.1.2=pyh9f0ad1d_0
  - clickclick=1.2.2=py_1
  - connexion=2.7.0=py_0
  - constantly=15.1.0=py_0
  - cryptography=3.1=py38hba49e27_0
  - cssselect=1.1.0=py_0
  - flask=1.1.2=pyh9f0ad1d_0
  - flask_cors=3.0.9=pyh9f0ad1d_0
  - hyperlink=20.0.1=pyh9f0ad1d_0
  - idna=2.10=pyh9f0ad1d_0
  - importlib-metadata=1.7.0=py38h32f6830_0
  - importlib_metadata=1.7.0=0
  - incremental=17.5.0=py_0
  - inflection=0.5.1=pyh9f0ad1d_0
  - itemadapter=0.1.0=py_0
  - itemloaders=1.0.2=py_0
  - itsdangerous=1.1.0=py_0
  - jinja2=2.11.2=pyh9f0ad1d_0
  - jmespath=0.10.0=pyh9f0ad1d_0
  - jsonschema=3.2.0=py38h32f6830_1
  - libiconv=1.16=he774522_0
  - libxml2=2.9.10=h1006b36_2
  - libxslt=1.1.33=h579f668_1
  - lxml=4.5.2=py38he3d0fc9_0
  - markupsafe=1.1.1=py38h9de7a3e_1
  - openapi-spec-validator=0.2.9=pyh9f0ad1d_0
  - openssl=1.1.1g=he774522_1
  - parsel=1.6.0=py_0
  - pathlib2=2.3.5=py38h32f6830_1
  - pip=20.2.2=py_0
  - protego=0.1.16=py_0
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.7=py_0
  - pycparser=2.20=pyh9f0ad1d_2
  - pydispatcher=2.0.5=py_1
  - pyhamcrest=2.0.2=py_0
  - pyopenssl=19.1.0=py_1
  - pyrsistent=0.16.0=py38h9de7a3e_0
  - pysocks=1.7.1=py38h32f6830_1
  - python=3.8.5=h60c2a47_7_cpython
  - python_abi=3.8=1_cp38
  - pywin32=227=py38hfa6e2cd_0
  - pyyaml=5.3.1=py38h9de7a3e_0
  - queuelib=1.5.0=pyh9f0ad1d_0
  - requests=2.24.0=pyh9f0ad1d_0
  - scrapy=2.3.0=py38h32f6830_0
  - service_identity=18.1.0=py_0
  - setuptools=49.6.0=py38h32f6830_0
  - six=1.15.0=pyh9f0ad1d_0
  - soupsieve=2.0.1=py_1
  - sqlite=3.33.0=he774522_0
  - twisted=20.3.0=py38h9de7a3e_0
  - urllib3=1.25.10=py_0
  - vc=14.1=h869be7e_1
  - vs2015_runtime=14.16.27012=h30e32a0_2
  - w3lib=1.22.0=pyh9f0ad1d_0
  - werkzeug=1.0.1=pyh9f0ad1d_0
  - wheel=0.35.1=pyh9f0ad1d_0
  - win_inet_pton=1.1.0=py38_0
  - wincertstore=0.2=py38_1003
  - yaml=0.2.5=he774522_0
  - zipp=3.1.0=py_0
  - zlib=1.2.11=h62dcd97_1009
  - zope.interface=5.1.0=py38h9de7a3e_0
prefix: E:\programfiles\Anaconda3\envs\swagger2 

(swagger2) C:\experiment_with_conda>conda deactivate 

(base) C:\experiment_with_conda>conda env remove --name swagger2 

Remove all packages in environment E:\programfiles\Anaconda3\envs\swagger2: 

Alternatively: conda remove --name myenv --all

(base) C:\experiment_with_conda>conda info --envs 

# conda environments:
#
base      *  E:\programfiles\Anaconda3
env_py_36    E:\programfiles\Anaconda3\envs\env_py_36
tf           E:\programfiles\Anaconda3\envs\tf 

Ref: conda.io

Pages