Sunday, September 20, 2020

Web Security - Prevent Website from opening in an IFrame


For Web Security, prevent website from opening in an IFrame as 'WhatsApp Web' does.

<div>
    <p>WhatsApp Error: Prevention from opening WhatsApp Web in an IFrame.</p>
</div>
<iframe src="https://web.whatsapp.com/" title="My WhatsApp" width=900 height=400></iframe> 

View in Mozilla Firefox:

Setting up Ubuntu 20.04 for Flutter based Android app development


1.
Install Git.

  $ sudo apt install git


2.
Create a directory where we download the 'flutter':

(base) ashish@ashish-VirtualBox:~/Desktop/ws/programfiles/flutter$ pwd
/home/ashish/Desktop/ws/programfiles/flutter_box 


3.
Download 'flutter':

$ pwd
/home/ashish/Desktop/ws/programfiles/flutter_box 

$ git clone https://github.com/flutter/flutter.git 


4.
Add the flutter tool to your path:

 $export PATH="$PATH:`pwd`/flutter/bin" 
 
OR

Update this in "~/.bashrc" file.

$ nano ~/.bashrc
$ source ~/.bashrc 


5.
Optionally, pre-download development binaries:

The flutter tool downloads platform-specific development binaries as needed. For scenarios where pre-downloading these artifacts is preferable (for example, in hermetic build environments, or with intermittent network availability), iOS and Android binaries can be downloaded ahead of time by running:

$ flutter precache 


6.
Install "Android SDK" from 'Terminal'.

$ sudo apt update && sudo apt install android-sdk 


7. 
Install "Android Studio" from "Ubuntu Software".

8. When you launch 'Android Studio' for the first time, it gives the prompt for 'Import Android Studio Settings':
Set it to "Do not import 'Settings'." 9. It will next launch the 'Android Studio Setup Wizard'.
10. Default JDK location:
11. Next, it downloads SDK components:
12. Prompt for 'Emulator Settings for Hardware Acceleration'
13. Undate Android license status. Run `flutter doctor --android-licenses` to accept the SDK licenses. See https://flutter.dev/docs/get-started/install/linux#android-setup for more details. $ flutter doctor --android-licenses 14. Launch "Settings" as shown below. Then go to "Plugins".
If we launch installation of 'Flutter' plugin, it automatically prompts for the installation for 'Dart'.
Then, give 'Android Studio' a restart. 15. Installing 'Flutter Extension' in Visual Studio Code. Go to 'Extensions' as shown below and search for 'flutter'.
...
16. Test installation: (base) ashish@ashish-VirtualBox:~/.../flutter_box$ flutter doctor Doctor summary (to see all details, run flutter doctor -v): [✓] Flutter (Channel master, 1.22.0-10.0.pre.264, on Linux, locale en_IN) [✓] Android toolchain - develop for Android devices (Android SDK version 30.0.2) [✓] Android Studio (version 4.0) [✓] VS Code (version 1.49.1) [!] Connected device ! No devices available ! Doctor found issues in 1 category. 17. Common Issues that we notice from 'flutter doctor': As of Flutter’s 1.19.0 dev release, the Flutter SDK contains the dart command alongside the flutter command so that you can more easily run Dart command-line programs. Downloading the Flutter SDK also downloads the compatible version of Dart, but if you’ve downloaded the Dart SDK separately, make sure that the Flutter version of dart is first in your path, as the two versions might not be compatible. $ flutter doctor Doctor summary (to see all details, run flutter doctor -v): 17.1. [!] Android toolchain - develop for Android devices (Android SDK version 27.0.1) ✗ Flutter requires Android SDK 29 and the Android BuildTools 28.0.3 To update the Android SDK visit Flutter.dev: Android Setup on Linux for detailed instructions. 17.2. ✗ Android license status unknown. Run `flutter doctor --android-licenses` to accept the SDK licenses. See Flutter.dev: Android Setup on Linux for more details. 17.3. ✗ Android licenses not accepted. To resolve this, run: flutter doctor --android-licenses 17.4. [!] Android Studio (not installed) 17.5. [!] Android Studio (version 4.0) ✗ Flutter plugin not installed; this adds Flutter specific functionality. 17.6 [!] Android Studio (version 4.0) ✗ Dart plugin not installed; this adds Dart specific functionality. 17.7. [!] VS Code (version 1.49.1) ✗ Flutter extension not installed; install from https://marketplace.visualstudio.com/items?itemName=Dart-Code.flutter 17.8. [!] Connected device ! No devices available ! Doctor found issues in 4 categories. Dated: Sep 2020 Ref: https://flutter.dev/docs/get-started/install/linux

Thursday, September 17, 2020

Binomial Probability Distribution (visualization using Seaborn)


Binomial Probability Distribution 

"pmf" is "Probability Mass Function" or "Probability Distribution". "rv" is "Random Variable".
Note: Binomial Distribution is a Discrete Distribution. Visualization of Binomial Distribution
Difference Between Normal and Binomial Distribution The main difference is that normal distribution is continous whereas binomial is discrete, but if there are enough data points it will be quite similar to normal distribution with certain loc and scale. We have code that produces overlapped "Normal" and "Binomial" distributions. We will show some of the best and some of the worst overlaps. number_of_trials = 150 for s in range(1000, 100000000, 1000000): print("size:", s) sns.distplot(random.binomial(n = number_of_trials, p=0.5, size=s), hist=False, label='binomial') sns.distplot(random.normal(loc = number_of_trials / 2, scale=5, size=s), hist=False, label='normal') plt.show() Best Overlaps
Worst Overlaps
References % numpy.org

Improving a Classifier (ML) Using Snorkel's Slicing Technique


The dataset we are using is the '150 datapoints strong' Iris flower species dataset (Download from here).

We have a dependency here to draw the confusion matrix. The code file name is: DrawConfusionMatrix.py

Content:

# Ref: Scikit-Learn 

import itertools
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize = False,
                          title = 'Confusion matrix',
                          cmap = plt.cm.Blues,
                          use_seaborn = False):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)
    
    if use_seaborn == False:
        plt.imshow(cm, interpolation='nearest', cmap=cmap)
        plt.colorbar()
        
        fmt = '.2f' if normalize else 'd'
        thresh = cm.max() / 2.
        for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
            plt.text(j, i, format(cm[i, j], fmt),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
            
        tick_marks = np.arange(len(classes) + 0)
    
    else:
        
        ax = sns.heatmap(cm, annot=True, fmt='d') #notation: "annot" not "annote"
        # fmt='d': print values as decimals
        
        bottom, top = ax.get_ylim()
        ax.set_ylim(bottom + 0.5, top - 0.5)
        tick_marks = np.arange(len(classes) + 1)

    plt.title(title)
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    plt.ylabel('True label')
    plt.xlabel('Predicted label') 
    
Now, the main problem: 

# Import libraries.

import DrawConfusionMatrix as dcm
import importlib # The imp module was deprecated in Python 3.4 in favor of the importlib module.
importlib.reload(dcm)

import pandas as pd
import numpy as np
from collections import Counter

from snorkel.augmentation import transformation_function
from snorkel.augmentation import RandomPolicy
from snorkel.augmentation import PandasTFApplier

from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix


df = pd.read_csv('datasets_19_420_Iris.csv') 

for i in set(df.Species):
    # ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']
    print(i)
    print(df[df.Species == i].describe().loc[['mean', 'std'], :], '\n') 
	
Iris-versicolor
            Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  75.50000       5.936000      2.770000       4.260000      1.326000
std   14.57738       0.516171      0.313798       0.469911      0.197753 

Iris-virginica
             Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  125.50000        6.58800      2.974000       5.552000       2.02600
std    14.57738        0.63588      0.322497       0.551895       0.27465 

Iris-setosa
            Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
mean  25.50000        5.00600      3.418000       1.464000       0.24400
std   14.57738        0.35249      0.381024       0.173511       0.10721  

 
features = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']

classes = ['Iris-setosa', 'Iris-virginica', 'Iris-versicolor']
desc_dict = {}
for i in classes:
    desc_dict[i] = df[df.Species == i].describe()
	
df['Train'] = 'Train'

# random.randint returns a random integer N such that a <= N <= b

@transformation_function(pre = [])
def get_new_instance_for_this_class(x):
    x.SepalLengthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['SepalLengthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['SepalLengthCm']].iloc[0,0], 2) * 100) / 100
    
    x.SepalWidthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['SepalWidthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['SepalWidthCm']].iloc[0,0], 2) * 100) / 100
    
    x.PetalLengthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['PetalLengthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['PetalLengthCm']].iloc[0,0], 2) * 100) / 100
    
    x.PetalWidthCm = np.random.normal(round(desc_dict[x.Species].loc[['mean'], ['PetalWidthCm']].iloc[0,0], 2) * 100, 
                  round(desc_dict[x.Species].loc[['std'], ['PetalWidthCm']].iloc[0,0], 2) * 100) / 100
    
    x.Train = 'Test'
    return x

tfs = [ get_new_instance_for_this_class ]

random_policy = RandomPolicy(
    len(tfs), sequence_length=2, n_per_original=5, keep_original=True
    # n_per_original (int) – Number of transformed data points per original
)

tf_applier = PandasTFApplier(tfs, random_policy)
df_train_augmented = tf_applier.apply(df)

print(f"Original training set size: {len(df)}")
print(f"Augmented training set size: {len(df_train_augmented)}") 

Original training set size: 150
Augmented training set size: 900 

df_test = df_train_augmented[df_train_augmented.Train == 'Test']

pred = clf.predict(df_test[features])

pred_probs = clf.predict_proba(df_test[features])
# Make Note Of >> AttributeError: predict_proba is not available when 'probability=False'

print(Counter(pred))
print("Accuracy: {:.3f}".format(accuracy_score(df_test['Species'], pred)))

cm = confusion_matrix(df_test['Species'], pred)
print("Confusion matrix:\n{}".format(cm))

Counter({'Iris-versicolor': 252, 'Iris-setosa': 250, 'Iris-virginica': 248})
Accuracy: 0.968
Confusion matrix:
[[250   0   0]
 [  0 239  11]
 [  0  13 237]] 

classes = ['setosa', 'versicolor', 'virginica']

dcm.plot_confusion_matrix(cm, classes = classes, use_seaborn = True) 

# This plot is for 'Support Vector Machine' based classifier.

# This plot is for 'Random Forest' based classifier.
Here we see that there are some misclassified data points for classes 'Versicolor' and 'Verginica'. 'Setosa' has not been misclassified by either SVM or RandomForest. Next, we would slice the dataframe into 'setosa' and 'not setosa' dataframes. Because we are not having issues with 'setosa' data points, we would re-train a classifier on the other two classes viz. 'versicolor' and 'virginica'. import re from snorkel.slicing import slicing_function @slicing_function() def not_setosa(x): return x.Species != 'Iris-setosa' sfs = [not_setosa] # ~ ~ ~ #Store slice metadata in S from snorkel.slicing import PandasSFApplier applier = PandasSFApplier(sfs) S_test = applier.apply(df_test) # ~ ~ ~ from snorkel.analysis import Scorer scorer = Scorer(metrics=["f1_micro", "f1_macro"]) # Make Note Of >> ValueError: f1 not supported for multiclass. # Try f1_micro or f1_macro instead. # ~ ~ ~ from sklearn import preprocessing le = preprocessing.LabelEncoder() le.fit(df_test['Species']) scorer.score_slices( S=S_test, golds=le.transform(df_test['Species']), preds=le.transform(pred), probs=pred_probs, as_dataframe=True )
from snorkel.slicing import slice_dataframe df_not_setosa = slice_dataframe(df_train_augmented, not_setosa) from sklearn.ensemble import RandomForestClassifier rfc = RandomForestClassifier(max_depth=4, random_state=0, n_estimators = 100) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=4, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=0, verbose=0, warm_start=False) df_test_rfc = df_not_setosa[df_not_setosa.Train == 'Test'] pred_rfc = rfc.predict(df_test_rfc[features]) print(Counter(pred_rfc)) print("Accuracy: {:.3f}".format(accuracy_score(df_test_rfc['Species'], pred_rfc))) cm = confusion_matrix(df_test_rfc['Species'], pred_rfc) print("Confusion matrix:\n{}".format(cm)) Counter({'Iris-versicolor': 251, 'Iris-virginica': 249}) Accuracy: 0.990 Confusion matrix: [[248 2] [ 3 247]] dcm.plot_confusion_matrix(cm, classes = ['versicolor', 'virginica'], use_seaborn = True) Using RandomForestClassifier on sliced dataset:
We also have the score for SVC, it is not as good as RandomForestClassifier: svc = svm.SVC(gamma = 'auto', probability=True) svc.fit(df_not_setosa[features], df_not_setosa['Species']) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=True, random_state=None, shrinking=True, tol=0.001, verbose=False) pred_svc = svc.predict(df_test_rfc[features]) print(Counter(pred_svc)) print("Accuracy: {:.3f}".format(accuracy_score(df_test_rfc['Species'], pred_svc))) cm = confusion_matrix(df_test_rfc['Species'], pred_svc) print("Confusion matrix:\n{}".format(cm)) Counter({'Iris-versicolor': 251, 'Iris-virginica': 249}) Accuracy: 0.986 Confusion matrix: [[247 3] [ 4 246]] Reference % Slice-based Learning: a Programming Model for Residual Learning in Critical Data Slices

Wednesday, September 16, 2020

Snorkel's Analysis Package Overview (v0.9.6, Sep 2020)


Current version of Snorkel is v0.9.6 (as on 16-Sep-2020). Link to GitHub Snorkel has 8 packages. Package Reference: 1. Snorkel Analysis Package 2. Snorkel Augmentation Package 3. Snorkel Classification Package 4. Snorkel Labeling Package 5. Snorkel Map Package 6. Snorkel Preprocess Package 7. Snorkel Slicing Package 8. Snorkel Utils Package What is Snorkel's Analysis Package for? This package dicusses how to interpret classification results. Generic model analysis utilities shared across Snorkel. 1: Scorer Calculate one or more scores from user-specified and/or user-defined metrics. This defines a class 'Scorer' with two methods: 'score()' and 'score_slices()'. You have specify input arguments such as metrics (this is related to the 'metric_score()' discussed below), true labels, predicted labels and predicted probabilities. It is through this that we make use of code in 'metrics.py' Code Snippet:
~~~ ~~~ ~~~ 2: get_label_buckets Return data point indices bucketed by label combinations. This is a function written in the error_analysis.py file. Code: import snorkel import numpy as np from snorkel.analysis import get_label_buckets print("Snorkel version:", snorkel.__version__) Snorkel version: 0.9.3 A common use case is calling ``buckets = label_buckets(Y_gold, Y_pred)`` where ``Y_gold`` is a set of gold (i.e. ground truth) labels and ``Y_pred`` is a corresponding set of predicted labels. Y_gold = np.array([1, 1, 1, 0, 0, 0, 1]) Y_pred = np.array([1, 1, -1, -1, 1, 0, 1]) buckets = get_label_buckets(Y_gold, Y_pred) # If gold and pred have different number of elements >> ValueError: Arrays must all have the same number of elements The returned ``buckets[(i, j)]`` is a NumPy array of data point indices with true label i and predicted label j. More generally, the returned indices within each bucket refer to the order of the labels that were passed in as function arguments. print(buckets[(1, 1)]) # true positives where both are 1 Out: array([0, 1, 6]) buckets[(0, 0)] # true positives where both are 0 Out: array([5]) # false positives, false negatives and true negatives print((1, 0) in buckets, '/', (0, 1) in buckets, '/', (0, 0) in buckets) Out: False / True / True buckets[(1, -1)] # abstained positives Out: array([2]) buckets[(0, -1)] # abstained negatives Out: array([3]) ~~~ ~~~ ~~~ 3: metric_score() Evaluate a standard metric on a set of predictions/probabilities. Code for metric_score() is in: target="_blank">metrics.py Using this you can evaluate a standard metric on a set of predictions (True Labels and Predicted Labels) / probabilities. Scores available are: 1. _coverage_score 2. _roc_auc_score 3. _f1_score 4. _f1_micro_score 5. _f1_macro_score It is a wrapper around "sklearn.metrics" and adds to it by giving the above five metrics. METRICS = { "accuracy": Metric(sklearn.metrics.accuracy_score), "coverage": Metric(_coverage_score, ["preds"]), "precision": Metric(sklearn.metrics.precision_score), "recall": Metric(sklearn.metrics.recall_score), "f1": Metric(_f1_score, ["golds", "preds"]), "f1_micro": Metric(_f1_micro_score, ["golds", "preds"]), "f1_macro": Metric(_f1_macro_score, ["golds", "preds"]), "fbeta": Metric(sklearn.metrics.fbeta_score), "matthews_corrcoef": Metric(sklearn.metrics.matthews_corrcoef), "roc_auc": Metric(_roc_auc_score, ["golds", "probs"]), }

Monday, September 14, 2020

Starting With Selenium's Python Package (Installation)



We have a YAML file to setup our conda environment. The file 'selenium.yml' has contents: name: selenium channels: - conda-forge - defaults dependencies: - selenium - jupyterlab - ipykernel To setup the environment, we run the command: (base) CMD> conda env create -f selenium.yml (selenium) CMD> conda activate selenium After that, if we want to see which all packages got installed, we run the command: (selenium) CMD> conda env export Next, we setup a kernel from this environment: (selenium) CMD> python -m ipykernel install --user --name selenium Installed kernelspec selenium in C:\Users\Ashish Jain\AppData\Roaming\jupyter\kernels\selenium To view the list of kernels: (selenium) CMD> jupyter kernelspec list Available kernels: selenium C:\Users\Ashish Jain\AppData\Roaming\jupyter\kernels\selenium python3 E:\programfiles\Anaconda3\envs\selenium\share\jupyter\kernels\python3 ... A basic piece of code would start the browser. We have tried and tested it for Chrome and Firefox. To do this, we need the web driver file or we get the following exception: CODE: from selenium import webdriver import time from selenium.webdriver.common.keys import Keys driver = webdriver.Chrome() ERROR: ---------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) E:\programfiles\Anaconda3\envs\selenium\lib\site-packages\selenium\webdriver\common\service.py in start(self) 71 cmd.extend(self.command_line_args()) ---> 72 self.process = subprocess.Popen(cmd, env=self.env, 73 close_fds=platform.system() != 'Windows', E:\programfiles\Anaconda3\envs\selenium\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text) 853 --> 854 self._execute_child(args, executable, preexec_fn, close_fds, 855 pass_fds, cwd, env, E:\programfiles\Anaconda3\envs\selenium\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session) 1306 try: -> 1307 hp, ht, pid, tid = _winapi.CreateProcess(executable, args, 1308 # no special security FileNotFoundError: [WinError 2] The system cannot find the file specified During handling of the above exception, another exception occurred: WebDriverException Traceback (most recent call last) ... WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home We got the file from here: chromedriver.storage.googleapis.com For v86 chromedriver_win32.zip ---> chromedriver.exe Error for WebDriver and Browser version mismatch: SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 86 Current browser version is 85.0.4183.102 with binary path C:\Program Files (x86)\Google\Chrome\Application\chrome.exe Download from here for Chrome v85: chromedriver.storage.googleapis.com For v85 One point to note about ChromeDriver as in September 2020: ChromeDriver only supports characters in the BMP (Basic Multilingual Plane) is a known issue with Chromium team as ChromeDriver still doesn't support characters with a Unicode after FFFF. Hence it is impossible to send any character beyond FFFF via ChromeDriver. As a result any attempt to send SMP (Supplementary Multilingual Plane) characters (e.g. CJK, Emojis, Symbols, etc) raises the error. While Firefox supports Emoji's sent via 'send_keys()' method. As of Unicode 13.0, the SMP comprises the following 134 blocks: Archaic Greek and Other Left-to-right scripts: Linear B Syllabary (10000–1007F) Linear B Ideograms (10080–100FF). ~ ~ ~ ~ ~ If you working with Firefox browser, you need the Gecko WebDriver available at the Windows 'PATH' variable. Without WebDriver file: FileNotFoundError: [WinError 2] The system cannot find the file specified WebDriverException: Message: 'geckodriver' executable needs to be in PATH. Download Gecko driver from here: GitHub Repo of Mozilla The statement to launch the web browser will be: driver = webdriver.Firefox() By default, browsers open in a partial size window. To maximize the window: driver.maximize_window() Now, we open a link: driver.get("http://survival8.blogspot.com/")

Wednesday, September 9, 2020

Sentiment Analysis using BERT, DistilBERT and ALBERT


We will do Sentiment Analysis using the code from this repo: GitHub

Check out the code from above repository to get started.

For creating Conda environment, we have a file "sentiment_analysis.yml" with content:

name: e20200909
channels:
  - defaults
  - conda-forge
  - pytorch
  
dependencies:
  - pytorch
  - pandas
  - numpy
  - pip:
    - transformers==3.0.1
  - flask
  - flask_cors
  - scikit-learn
  - ipykernel 

(base) C:\>conda env create -f sentiment_analysis.yml

It will install the above mentioned dependencies and the nested dependencies.

(base) C:\Users\Ashish Jain>conda env list 
# conda environments:
#
base                  *  E:\programfiles\Anaconda3
e20200909                E:\programfiles\Anaconda3\envs\e20200909
env_py_36                E:\programfiles\Anaconda3\envs\env_py_36
temp                     E:\programfiles\Anaconda3\envs\temp
temp202009               E:\programfiles\Anaconda3\envs\temp202009
tf                       E:\programfiles\Anaconda3\envs\tf 

(base) C:\Users\Ashish Jain>conda activate e20200909 

(e20200909) C:\Users\Ashish Jain>conda env export
name: e20200909
channels:
  - conda-forge
  - defaults
dependencies:
  - _pytorch_select=0.1=cpu_0
  - backcall=0.2.0=py_0
  - blas=1.0=mkl
  - ca-certificates=2020.7.22=0
  - certifi=2020.6.20=py38_0
  - cffi=1.14.2=py38h7a1dbc1_0
  - click=7.1.2=py_0
  - colorama=0.4.3=py_0
  - decorator=4.4.2=py_0
  - flask=1.1.2=py_0
  - flask_cors=3.0.9=pyh9f0ad1d_0
  - icc_rt=2019.0.0=h0cc432a_1
  - intel-openmp=2019.4=245
  - ipykernel=5.3.4=py38h5ca1d4c_0
  - ipython=7.18.1=py38h5ca1d4c_0
  - ipython_genutils=0.2.0=py38_0
  - itsdangerous=1.1.0=py_0
  - jedi=0.17.2=py38_0
  - jinja2=2.11.2=py_0
  - joblib=0.16.0=py_0
  - jupyter_client=6.1.6=py_0
  - jupyter_core=4.6.3=py38_0
  - libmklml=2019.0.5=0
  - libsodium=1.0.18=h62dcd97_0
  - markupsafe=1.1.1=py38he774522_0
  - mkl=2019.4=245
  - mkl-service=2.3.0=py38hb782905_0
  - mkl_fft=1.1.0=py38h45dec08_0
  - mkl_random=1.1.0=py38hf9181ef_0
  - ninja=1.10.1=py38h7ef1ec2_0
  - numpy=1.19.1=py38h5510c5b_0
  - numpy-base=1.19.1=py38ha3acd2a_0
  - openssl=1.1.1g=he774522_1
  - pandas=1.1.1=py38ha925a31_0
  - parso=0.7.0=py_0
  - pickleshare=0.7.5=py38_1000
  - pip=20.2.2=py38_0
  - prompt-toolkit=3.0.7=py_0
  - pycparser=2.20=py_2
  - pygments=2.6.1=py_0
  - python=3.8.5=h5fd99cc_1
  - python-dateutil=2.8.1=py_0
  - pytorch=1.6.0=cpu_py38h538a6d7_0
  - pytz=2020.1=py_0
  - pywin32=227=py38he774522_1
  - pyzmq=19.0.1=py38ha925a31_1
  - scikit-learn=0.23.2=py38h47e9c7a_0
  - scipy=1.5.0=py38h9439919_0
  - setuptools=49.6.0=py38_0
  - six=1.15.0=py_0
  - sqlite=3.33.0=h2a8f88b_0
  - threadpoolctl=2.1.0=pyh5ca1d4c_0
  - tornado=6.0.4=py38he774522_1
  - traitlets=4.3.3=py38_0
  - vc=14.1=h0510ff6_4
  - vs2015_runtime=14.16.27012=hf0eaf9b_3
  - wcwidth=0.2.5=py_0
  - werkzeug=1.0.1=py_0
  - wheel=0.35.1=py_0
  - wincertstore=0.2=py38_0
  - zeromq=4.3.2=ha925a31_2
  - zlib=1.2.11=h62dcd97_4
  - pip:
    - chardet==3.0.4
    - filelock==3.0.12
    - idna==2.10
    - packaging==20.4
    - pyparsing==2.4.7
    - regex==2020.7.14
    - requests==2.24.0
    - sacremoses==0.0.43
    - sentencepiece==0.1.91
    - tokenizers==0.8.0rc4
    - tqdm==4.48.2
    - transformers==3.0.1
    - urllib3==1.25.10
prefix: E:\programfiles\Anaconda3\envs\e20200909

(e20200909) C:\Users\Ashish Jain> 

Next, we run the 'analyser' code:

(e20200909) C:\SentimentAnalysis-master>python analyze.py 
Please wait while the analyser is being prepared.
Input sentiment to analyze: I am feeling good.
Positive with probability 99%.
Input sentiment to analyze: I am feeling bad.
Negative with probability 99%.
Input sentiment to analyze: I am Ashish.
Positive with probability 81%.
Input sentiment to analyze: 

Next, we run it in browser:

We pass the same sentences as above.

Here are server logs:

(e20200909) C:\SentimentAnalysis-master>python server.py 
 * Serving Flask app "server" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [09/Sep/2020 21:35:48] "GET / HTTP/1.1" 400 -
127.0.0.1 - - [09/Sep/2020 21:35:48] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [09/Sep/2020 21:36:02] "GET /?text=hello HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:38] "GET /?text=shut%20up HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:50] "GET /?text=i%20am%20feeling%20good HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:36:54] "GET /?text=i%20am%20feeling%20bad HTTP/1.1" 200 -
127.0.0.1 - - [09/Sep/2020 21:37:00] "GET /?text=i%20am%20ashish HTTP/1.1" 200 - 

The browser screens:

Tuesday, September 8, 2020

2 X 2 Idempotent matrix

I had to provide an example of an idempotent matrix. That's the kind of matrix that yields itself when multiplied to itself. Much like 0 and 1 in scalar multiplication (1 x 1 = 1).
It is not so easy to predict the result of a matrix multiplication, especially for large matrices. So, instead of settling with the naïve method of guessing with trial and error, I explored the properties of a square matrix of the order 2.
In this page I state the question and begin to attempt it. I realised that for a matrix to be idempotent, it would have to retain its dimensions (order), and hence be a square matrix.
I have intentionally put distinct variable names a,b,c, and d. This is to ensure that the possibility of a different number at each index is open. I derived 'bc' from the first equation and substituted it into its instance in the last equation to obtain a solution for 'a'.
Since 0 cannot be divided by 0, I could not divide 0 by either term unless it was a non-zero term. Thus, I had two possibilities, to which I called case A and B.
I solved the four equations in case A by making substitutions into the 4 main equations. Later tested the solution with b=1.
As you can see, I could not use the elimination method in an advantageous manner for this case.
I couldn't get a unique solution in either case. That is because there are many possible square matrices that are idempotent. However, I don't feel comfortable to intuit that every 2 X 2 idempotent matrix has one of only two possible numbers as its first and last elements.

Others’ take on it

My classmate Sabari Sreekumar did manage to use elimination for the ‘bc’ term for the general case.
I took it a step further and defined the last element in terms of the other elements
So given any 2 X 2 idempotent matrix and its first three elements, you can find the last element unequivocally with this formula.

Conclusion

I wonder if multiples of matrices that satisfy either case are also idempotent. Perhaps I will see if I can prove that in another post.
In the next lecture, professor Venkata Ratnam suggested using the sure-shot approach of a zero matrix. And I was like “Why didn’t I think of that”?

Sunday, September 6, 2020

Setting up Conda Environment for Swagger and Scrapy based project


We have a file that reads "my_yml.yml": name: swagger2 channels: - conda-forge - defaults dependencies: - beautifulsoup4 - connexion - flask - flask_cors - scrapy It will do these three things: 1. It will create an environment "swagger2". 2. For downloading packages, it will use the channels: "conda-forge" and "defaults" 3. The packages it will install are mentioned as "dependencies". Checking our current environments: (base) C:\Users\Ashish Jain>conda env list # conda environments: base * E:\programfiles\Anaconda3 env_py_36 E:\programfiles\Anaconda3\envs\env_py_36 tf E:\programfiles\Anaconda3\envs\tf (base) C:\experiment_with_conda>conda env create -f my_yml.yml Collecting package metadata (repodata.json): done Solving environment: done Downloading and Extracting Packages pysocks-1.7.1 | 27 KB | ### | 100% flask_cors-3.0.9 | 15 KB | ### | 100% chardet-3.0.4 | 189 KB | ### | 100% clickclick-1.2.2 | 9 KB | ### | 100% cssselect-1.1.0 | 18 KB | ### | 100% importlib-metadata-1 | 45 KB | ### | 100% attrs-20.2.0 | 41 KB | ### | 100% protego-0.1.16 | 2.6 MB | ### | 100% twisted-20.3.0 | 5.1 MB | ### | 100% pywin32-227 | 6.9 MB | ### | 100% pyrsistent-0.16.0 | 91 KB | ### | 100% beautifulsoup4-4.9.1 | 86 KB | ### | 100% connexion-2.7.0 | 51 KB | ### | 100% pyhamcrest-2.0.2 | 29 KB | ### | 100% libxslt-1.1.33 | 499 KB | ### | 100% libxml2-2.9.10 | 3.5 MB | ### | 100% incremental-17.5.0 | 14 KB | ### | 100% flask-1.1.2 | 70 KB | ### | 100% scrapy-2.3.0 | 640 KB | ### | 100% automat-20.2.0 | 30 KB | ### | 100% python-3.8.5 | 18.9 MB | ### | 100% bcrypt-3.2.0 | 41 KB | ### | 100% service_identity-18. | 12 KB | ### | 100% win_inet_pton-1.1.0 | 7 KB | ### | 100% cryptography-3.1 | 587 KB | ### | 100% libiconv-1.16 | 680 KB | ### | 100% jmespath-0.10.0 | 21 KB | ### | 100% markupsafe-1.1.1 | 29 KB | ### | 100% parsel-1.6.0 | 15 KB | ### | 100% constantly-15.1.0 | 9 KB | ### | 100% pydispatcher-2.0.5 | 12 KB | ### | 100% zope.interface-5.1.0 | 299 KB | ### | 100% pyasn1-modules-0.2.7 | 60 KB | ### | 100% hyperlink-20.0.1 | 42 KB | ### | 100% inflection-0.5.1 | 9 KB | ### | 100% pyasn1-0.4.8 | 53 KB | ### | 100% w3lib-1.22.0 | 21 KB | ### | 100% pathlib2-2.3.5 | 34 KB | ### | 100% jinja2-2.11.2 | 93 KB | ### | 100% setuptools-49.6.0 | 968 KB | ### | 100% queuelib-1.5.0 | 13 KB | ### | 100% itemloaders-1.0.2 | 14 KB | ### | 100% pyyaml-5.3.1 | 158 KB | ### | 100% soupsieve-2.0.1 | 30 KB | ### | 100% brotlipy-0.7.0 | 368 KB | ### | 100% wincertstore-0.2 | 13 KB | ### | 100% lxml-4.5.2 | 1.1 MB | ### | 100% cffi-1.14.1 | 227 KB | ### | 100% itsdangerous-1.1.0 | 16 KB | ### | 100% click-7.1.2 | 64 KB | ### | 100% certifi-2020.6.20 | 151 KB | ### | 100% python_abi-3.8 | 4 KB | ### | 100% zlib-1.2.11 | 126 KB | ### | 100% openapi-spec-validat | 23 KB | ### | 100% jsonschema-3.2.0 | 108 KB | ### | 100% itemadapter-0.1.0 | 10 KB | ### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate swagger2 # # To deactivate an active environment, use # # $ conda deactivate (base) C:\experiment_with_conda>conda activate swagger2 (swagger2) C:\experiment_with_conda>conda env export name: swagger2 channels: - conda-forge - defaults dependencies: - attrs=20.2.0=pyh9f0ad1d_0 - automat=20.2.0=py_0 - bcrypt=3.2.0=py38h1e8a9f7_0 - beautifulsoup4=4.9.1=py_1 - brotlipy=0.7.0=py38h1e8a9f7_1000 - ca-certificates=2020.6.20=hecda079_0 - certifi=2020.6.20=py38h32f6830_0 - cffi=1.14.1=py38hba49e27_0 - chardet=3.0.4=py38h32f6830_1006 - click=7.1.2=pyh9f0ad1d_0 - clickclick=1.2.2=py_1 - connexion=2.7.0=py_0 - constantly=15.1.0=py_0 - cryptography=3.1=py38hba49e27_0 - cssselect=1.1.0=py_0 - flask=1.1.2=pyh9f0ad1d_0 - flask_cors=3.0.9=pyh9f0ad1d_0 - hyperlink=20.0.1=pyh9f0ad1d_0 - idna=2.10=pyh9f0ad1d_0 - importlib-metadata=1.7.0=py38h32f6830_0 - importlib_metadata=1.7.0=0 - incremental=17.5.0=py_0 - inflection=0.5.1=pyh9f0ad1d_0 - itemadapter=0.1.0=py_0 - itemloaders=1.0.2=py_0 - itsdangerous=1.1.0=py_0 - jinja2=2.11.2=pyh9f0ad1d_0 - jmespath=0.10.0=pyh9f0ad1d_0 - jsonschema=3.2.0=py38h32f6830_1 - libiconv=1.16=he774522_0 - libxml2=2.9.10=h1006b36_2 - libxslt=1.1.33=h579f668_1 - lxml=4.5.2=py38he3d0fc9_0 - markupsafe=1.1.1=py38h9de7a3e_1 - openapi-spec-validator=0.2.9=pyh9f0ad1d_0 - openssl=1.1.1g=he774522_1 - parsel=1.6.0=py_0 - pathlib2=2.3.5=py38h32f6830_1 - pip=20.2.2=py_0 - protego=0.1.16=py_0 - pyasn1=0.4.8=py_0 - pyasn1-modules=0.2.7=py_0 - pycparser=2.20=pyh9f0ad1d_2 - pydispatcher=2.0.5=py_1 - pyhamcrest=2.0.2=py_0 - pyopenssl=19.1.0=py_1 - pyrsistent=0.16.0=py38h9de7a3e_0 - pysocks=1.7.1=py38h32f6830_1 - python=3.8.5=h60c2a47_7_cpython - python_abi=3.8=1_cp38 - pywin32=227=py38hfa6e2cd_0 - pyyaml=5.3.1=py38h9de7a3e_0 - queuelib=1.5.0=pyh9f0ad1d_0 - requests=2.24.0=pyh9f0ad1d_0 - scrapy=2.3.0=py38h32f6830_0 - service_identity=18.1.0=py_0 - setuptools=49.6.0=py38h32f6830_0 - six=1.15.0=pyh9f0ad1d_0 - soupsieve=2.0.1=py_1 - sqlite=3.33.0=he774522_0 - twisted=20.3.0=py38h9de7a3e_0 - urllib3=1.25.10=py_0 - vc=14.1=h869be7e_1 - vs2015_runtime=14.16.27012=h30e32a0_2 - w3lib=1.22.0=pyh9f0ad1d_0 - werkzeug=1.0.1=pyh9f0ad1d_0 - wheel=0.35.1=pyh9f0ad1d_0 - win_inet_pton=1.1.0=py38_0 - wincertstore=0.2=py38_1003 - yaml=0.2.5=he774522_0 - zipp=3.1.0=py_0 - zlib=1.2.11=h62dcd97_1009 - zope.interface=5.1.0=py38h9de7a3e_0 prefix: E:\programfiles\Anaconda3\envs\swagger2 (swagger2) C:\experiment_with_conda>conda deactivate (base) C:\experiment_with_conda>conda env remove --name swagger2 Remove all packages in environment E:\programfiles\Anaconda3\envs\swagger2: Alternatively: conda remove --name myenv --all (base) C:\experiment_with_conda>conda info --envs # conda environments: # base * E:\programfiles\Anaconda3 env_py_36 E:\programfiles\Anaconda3\envs\env_py_36 tf E:\programfiles\Anaconda3\envs\tf Ref: conda.io

Saturday, September 5, 2020

Prediction of Nifty50 index using LSTM based model


Here we will use LSTM layers to develop time series forecasting model for the prediction of Nifty50 index's closing value.

Our environment:

(py383) ashish@ashish-VirtualBox:~/Desktop$ conda list keras
# packages in environment at /home/ashish/anaconda3/envs/py383:
#
# Name                    Version                   Build  Channel
keras                     2.4.3                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
(py383) ashish@ashish-VirtualBox:~/Desktop$ conda list tensorflow
# packages in environment at /home/ashish/anaconda3/envs/py383:
#
# Name                    Version                   Build  Channel
tensorflow                2.2.0                    pypi_0    pypi
tensorflow-estimator      2.2.0                    pypi_0    pypi
(py383) ashish@ashish-VirtualBox:~/Desktop$ conda list matplotlib
# packages in environment at /home/ashish/anaconda3/envs/py383:
#
# Name                    Version                   Build  Channel
matplotlib                3.2.2                         0  
matplotlib-base           3.2.2            py38hef1b27d_0  
(py383) ashish@ashish-VirtualBox:~/Desktop$ conda list scikit-learn
# packages in environment at /home/ashish/anaconda3/envs/py383:
#
# Name                    Version                   Build  Channel
scikit-learn              0.23.1           py38h423224d_0  
(py383) ashish@ashish-VirtualBox:~/Desktop$ conda list seaborn
# packages in environment at /home/ashish/anaconda3/envs/py383:
#
# Name                    Version                   Build  Channel
seaborn                   0.10.1                     py_0  

Python Code:

from __future__ import print_function
import os
import sys
import pandas as pd
import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
import datetime
from dateutil.parser import parse
from sklearn.metrics import mean_absolute_error 

# Read the dataset 
l = []
for i in os.listdir('files_2'):
    l.append(pd.read_csv(os.path.join('files_2', i)))

df = pd.concat(l, axis = 0) 

We have data that looks like:

def convert_str_to_date(in_date): return parse(in_date) df['Date'] = df['Date'].apply(convert_str_to_date) df.sort_values(by = ['Date'], axis = 0, ascending = True, inplace = True, na_position = 'last') df.reset_index(drop=True, inplace=True) Gradient descent algorithms perform better (for example converge faster) if the variables are wihtin range [-1, 1]. Many sources relax the boundary to even [-3, 3]. The 'close' variable is mixmax scaled to bound the tranformed variable within [0,1]. from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1)) df['scaled_close'] = scaler.fit_transform(np.array(df['Close']).reshape(-1, 1)) Before training the model, the dataset is split in two parts - train set and validation set. The neural network is trained on the train set. This means computation of the loss function, back propagation and weights updated by a gradient descent algorithm is done on the train set. The validation set is used to evaluate the model and to determine the number of epochs in model training. Increasing the number of epochs will further decrease the loss function on the train set but might not neccesarily have the same effect for the validation set due to overfitting on the train set. Hence, the number of epochs is controlled by keeping a tap on the loss function computed for the validation set. We use Keras with Tensorflow backend to define and train the model. All the steps involved in model training and validation is done by calling appropriate functions of the Keras API. # Let's start by splitting the dataset into train and validation. split_date = datetime.datetime(year=2020, month=8, day=1, hour=0) df_train = df.loc[df['Date'] < split_date] df_val = df.loc[df['Date'] >= split_date] # Reset the indices of the validation set df_val.reset_index(drop=True, inplace=True) Now we need to generate regressors (X) and target variable (y) for train and validation. 2-D array of regressor and 1-D array of target is created from the original 1-D array of columm 'Close' in the DataFrames. For the time series forecasting model, Past seven days of observations are used to predict for the next day. This is equivalent to a AR(7) model. We define a function which takes the original time series and the number of timesteps in regressors as input to generate the arrays of X and y. The makeXy function is used to generate arrays of regressors and targets-X_train, X_val, y_train and y_val. X_train, and X_val, as generated by the makeXy function, are 2D arrays of shape (number of samples, number of timesteps). However, the input to RNN layers must be of shape (number of samples, number of timesteps, number of features per timestep). In this case, we are dealing with only 'Close', hence number of features per timestep is one. Number of timesteps is seven and number of samples is the same as the number of samples in X_train and X_val, which are reshaped to 3D arrays: def makeXy(ts, nb_timesteps): """ Input: ts: original time series nb_timesteps: number of time steps in the regressors Output: X: 2-D array of regressors y: 1-D array of target """ X = [] y = [] for i in range(nb_timesteps, ts.shape[0]): X.append(list(ts.loc[i-nb_timesteps:i-1])) y.append(ts.loc[i]) X, y = np.array(X), np.array(y) return X, y X_train, y_train = makeXy(df_train['scaled_close'], 7) X_val, y_val = makeXy(df_val['scaled_close'], 7) #X_train and X_val are reshaped to 3D arrays X_train, X_val = X_train.reshape((X_train.shape[0], X_train.shape[1], 1)), X_val.reshape((X_val.shape[0], X_val.shape[1], 1)) Now we define the MLP using the Keras Functional API. In this approach a layer can be declared as the input of the following layer at the time of defining the next layer. from keras.layers import Dense, Input, Dropout from keras.layers.recurrent import LSTM from keras.optimizers import SGD from keras.models import Model from keras.models import load_model from keras.callbacks import ModelCheckpoint #Define input layer which has shape (None, 7) and of type float32. None indicates the number of instances input_layer = Input(shape=(7,1), dtype='float32') The LSTM layers are defined for seven timesteps. In this example, two LSTM layers are stacked. The first LSTM returns the output from each all seven timesteps. This output is a sequence and is fed to the second LSTM which returns output only from the last step. The first LSTM has sixty four hidden neurons in each timestep. Hence the sequence returned by the first LSTM has sixty four features. lstm_layer1 = LSTM(64, input_shape=(7,1), return_sequences=True)(input_layer) lstm_layer2 = LSTM(32, input_shape=(7,64), return_sequences=False)(lstm_layer1) dropout_layer = Dropout(0.2)(lstm_layer2) #Finally the output layer gives prediction. output_layer = Dense(1, activation='linear')(dropout_layer) The input, dense and output layers will now be packed inside a Model, which is wrapper class for training and making predictions. In case of presence of outliers, mean absolute error (MAE) is used as absolute deviations suffer less fluctuations compared to squared deviations. The network's weights are optimized by the Adam algorithm. Adam stands for adaptive moment estimation and has been a popular choice for training deep neural networks. Unlike, stochastic gradient descent, adam uses different learning rates for each weight and separately updates the same as the training progresses. The learning rate of a weight is updated based on exponentially weighted moving averages of the weight's gradients and the squared gradients. ts_model = Model(inputs=input_layer, outputs=output_layer) ts_model.compile(loss='mean_absolute_error', optimizer='adam')#SGD(lr=0.001, decay=1e-5)) ts_model.summary() Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 7, 1)] 0 _________________________________________________________________ lstm (LSTM) (None, 7, 64) 16896 _________________________________________________________________ lstm_1 (LSTM) (None, 32) 12416 _________________________________________________________________ dropout (Dropout) (None, 32) 0 _________________________________________________________________ dense (Dense) (None, 1) 33 ================================================================= Total params: 29,345 Trainable params: 29,345 Non-trainable params: 0 _________________________________________________________________ The model is trained by calling the fit function on the model object and passing the X_train and y_train. The training is done for a predefined number of epochs. Additionally, batch_size defines the number of samples of train set to be used for a instance of back propagation.The validation dataset is also passed to evaluate the model after every epoch completes. A ModelCheckpoint object tracks the loss function on the validation set and saves the model for the epoch, at which the loss function has been minimum. save_weights_at = os.path.join('files_1', 'models', 'p5', 'p5_nifty50_LSTM_weights.{epoch:02d}-{val_loss:.4f}.hdf5') save_best = ModelCheckpoint(save_weights_at, monitor='val_loss', verbose=0, save_best_only=True, save_weights_only=False, mode='min', period=1) ts_model.fit(x=X_train, y=y_train, batch_size=16, epochs=30, verbose=1, callbacks=[save_best], validation_data=(X_val, y_val), shuffle=True) WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of batches seen. Epoch 1/30 381/381 [==============================] - 13s 33ms/step - loss: 0.0181 - val_loss: 0.0258 ... 381/381 [==============================] - 10s 25ms/step - loss: 0.0175 - val_loss: 0.0384 [tensorflow.python.keras.callbacks.History at 0x7fed1c0a05b0] Prediction are made from the best saved model. The model's predictions, which are on the standardized 'Rate', are inverse transformed to get predictions of original 'Rate'. best_model = load_model(os.path.join('files_1', 'models', 'p5', 'p5_nifty50_LSTM_weights.12-0.0057.hdf5')) preds = best_model.predict(X_val) pred = scaler.inverse_transform(preds) pred = np.squeeze(pred) mae = mean_absolute_error(df_val['Close'].loc[7:], pred) print('MAE for the validation set:', round(mae, 4)) MAE for the validation set: 65.7769 #Let's plot the actual and predicted values. plt.figure(figsize=(5.5, 5.5)) plt.plot(range(len(df_val['Close'].loc[7:])), df_val['Close'].loc[7:], linestyle='-', marker='*', color='r') plt.plot(range(len(df_val['Close'].loc[7:])), pred[:df_val.shape[0]], linestyle='-', marker='.', color='b') plt.legend(['Actual','Predicted'], loc=2) plt.title('Actual vs Predicted') plt.ylabel('Close') plt.xlabel('Index')
from sklearn.metrics import r2_score r2 = r2_score(df_val['Close'].loc[7:], pred) print('R-squared for the validation set:', round(r2,4)) R-squared for the validation set: 0.3702