Thursday, October 15, 2020

BERT is aware of the context of a word in a sentence


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics.pairwise import cosine_similarity
import torch
import transformers as ppb
import warnings
warnings.filterwarnings('ignore')

from joblib import load, dump
import json
import re 

print(ppb.__version__)

'3.0.1'

Loading the Pre-trained BERT model

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')

# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights) 

When run first time, the above statements loads a model of 440MB in size.

Word Ambiguities 

def get_embedding(in_list):
    tokenized = [tokenizer.encode(x, add_special_tokens=True) for x in in_list]
    
    max_len = 0
    for i in tokenized:
        if len(i) > max_len:
            max_len = len(i)

    padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized])
    
    attention_mask = np.where(padded != 0, 1, 0)
    
    input_ids = torch.LongTensor(padded)
    attention_mask = torch.tensor(attention_mask)

    with torch.no_grad():
        last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask)
        
    features = last_hidden_states[0][:,0,:].numpy()
    return features 

python_strings = [
    'I love coding in Python language.',
    'Python is more readable than Java.',
    'Pythons are famous for their very long body.',
    'Python is famous for its very long body.',
    'All six continents have a python species.',
    'Python is a programming language.',
    'Python is a reptile.',
    'The python ate a mouse.',
    'python ate a mouse'
]

string_embeddings = get_embedding(python_strings)   

print(string_embeddings.shape) 

(9, 768)

csm = cosine_similarity(X = string_embeddings, Y=None, dense_output=True)
print(csm.round(2)) 

In the picture below, if we ignore the diagnol (that is similarity of a sentence to itself), we are able to see which sentence is closer to which.


[[1.   0.83 0.8  0.79 0.8  0.84 0.84 0.81 0.81]
 [0.83 1.   0.79 0.76 0.8  0.87 0.79 0.8  0.79]
 [0.8  0.79 1.   0.96 0.86 0.77 0.88 0.77 0.78]
 [0.79 0.76 0.96 1.   0.82 0.77 0.9  0.75 0.77]
 [0.8  0.8  0.86 0.82 1.   0.78 0.85 0.8  0.8 ]
 [0.84 0.87 0.77 0.77 0.78 1.   0.81 0.76 0.78]
 [0.84 0.79 0.88 0.9  0.85 0.81 1.   0.81 0.86]
 [0.81 0.8  0.77 0.75 0.8  0.76 0.81 1.   0.9 ]
 [0.81 0.79 0.78 0.77 0.8  0.78 0.86 0.9  1.  ]] 

for i in range(len(csm)):
    ord_indx = np.argsort(csm[i])[::-1]
    print(python_strings[ord_indx[0]])
    print([python_strings[j] for j in ord_indx[1:]])
    print()

I love coding in Python language.
['Python is a reptile.', 'Python is a programming language.', 'Python is more readable than Java.', 'python ate a mouse', 'The python ate a mouse.', 'All six continents have a python species.', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.']

Python is more readable than Java.
['Python is a programming language.', 'I love coding in Python language.', 'All six continents have a python species.', 'The python ate a mouse.', 'Python is a reptile.', 'python ate a mouse', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.']

Pythons are famous for their very long body.
['Python is famous for its very long body.', 'Python is a reptile.', 'All six continents have a python species.', 'I love coding in Python language.', 'Python is more readable than Java.', 'python ate a mouse', 'Python is a programming language.', 'The python ate a mouse.']

Python is famous for its very long body.
['Pythons are famous for their very long body.', 'Python is a reptile.', 'All six continents have a python species.', 'I love coding in Python language.', 'python ate a mouse', 'Python is a programming language.', 'Python is more readable than Java.', 'The python ate a mouse.']

All six continents have a python species.
['Pythons are famous for their very long body.', 'Python is a reptile.', 'Python is famous for its very long body.', 'I love coding in Python language.', 'Python is more readable than Java.', 'The python ate a mouse.', 'python ate a mouse', 'Python is a programming language.']

Python is a programming language.
['Python is more readable than Java.', 'I love coding in Python language.', 'Python is a reptile.', 'All six continents have a python species.', 'python ate a mouse', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.', 'The python ate a mouse.']

Python is a reptile.
['Python is famous for its very long body.', 'Pythons are famous for their very long body.', 'python ate a mouse', 'All six continents have a python species.', 'I love coding in Python language.', 'Python is a programming language.', 'The python ate a mouse.', 'Python is more readable than Java.']

The python ate a mouse. 
['python ate a mouse', 'I love coding in Python language.', 'Python is a reptile.', 'All six continents have a python species.', 'Python is more readable than Java.', 'Pythons are famous for their very long body.', 'Python is a programming language.', 'Python is famous for its very long body.']

python ate a mouse 
['The python ate a mouse.', 'Python is a reptile.', 'I love coding in Python language.', 'All six continents have a python species.', 'Python is more readable than Java.', 'Python is a programming language.', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.'] 

Few observations

1. "python ate a mouse" is more closer to "Python is a reptile." than "The python ate a mouse."
For closeness of these sentences to "Python is a reptile" shows "python ate a mouse" at number 3 while "The python ate a mouse" appears at number 7.

2. The model we are using is "uncased" so capitalization does not matter.

3. Sentences about Python language are similar to each other, and sentences about Python reptile are similar to each other.

4. Word "python" or "Python" alone is closest to 'I love coding in Python language.' then to 'Python is a reptile.', see code snippet below.

from scipy.spatial import distance
python_embedding = get_embedding('python')
csm = [1 - distance.cosine(u = python_embedding[0], v = i) for i in string_embeddings]
print([python_strings[j] for j in np.argsort(csm)[::-1]]) 

['I love coding in Python language.', 
'Python is a reptile.', 
'python ate a mouse', 
'The python ate a mouse.', 
'All six continents have a python species.', 
'Python is a programming language.', 
'Python is more readable than Java.', 
'Python is famous for its very long body.', 
'Pythons are famous for their very long body.'] 

Wednesday, October 14, 2020

Compare pip and conda installations


1. View all environments.

(base) C:\Users\aj>conda env list
# conda environments:
#
base                  *  D:\programfiles\Anaconda3
bert_aas                 D:\programfiles\Anaconda3\envs\bert_aas
e20200909                D:\programfiles\Anaconda3\envs\e20200909
...
tf                       D:\programfiles\Anaconda3\envs\tf 

2. View all Jupyter Kernels.

(base) C:\Users\aj>jupyter kernelspec list
Available kernels:
temp       C:\Users\aj\AppData\Roaming\jupyter\kernels\temp
tf         C:\Users\aj\AppData\Roaming\jupyter\kernels\tf
python3    D:\programfiles\Anaconda3\share\jupyter\kernels\python3
py38       C:\ProgramData\jupyter\kernels\py38

(base) C:\Users\aj> 

==========

3. A note about updating a package in Conda (from debugging instructions).

(base) C:\Users\aj>conda update ipykernel jupyter -c conda-forge

Updating ipykernel is constricted by

anaconda -> requires ipykernel==5.1.4=py37h39e3cac_0

If you are sure you want an update of your package either try `conda update --all` or install a specific version of the package you want using `conda install [pkg]=[version]`

done

==> WARNING: A newer version of conda exists.
current version: 4.8.4
latest version: 4.8.5

Please update conda by running

  $ conda update -n base -c defaults conda 

In this command, by '-n base' we mean 'base' is the name of the environment.
By '-c defaults', we mean download 'conda' from the 'defaults' channel.

Note: Following are three commonly used channels for downloading Python packages:

1. pkgs/main
2. defaults
3. conda-forge

==========

4. Installing a new Jupyter kernel. 

(e20200909) CMD>python -m ipykernel install --user --name e20200909 
Installed kernelspec e20200909 in C:\Users\aj\AppData\Roaming\jupyter\kernels\e20200909 

==========

5. Checking installation 
Differentiating between 'pip' and 'conda' installation. 

(e20200909) D:\ws\jupyter>pip freeze | findstr /C:"jupyter" /C:"jupyterlab" 
jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1594826976318/work
jupyter-console @ file:///home/conda/feedstock_root/build_artifacts/jupyter_console_1598728807792/work
jupyter-core==4.6.3
jupyterlab==2.2.8
jupyterlab-pygments @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_pygments_1601375948261/work
jupyterlab-server @ file:///home/conda/feedstock_root/build_artifacts/jupyterlab_server_1593951277307/work 

(e20200909) C:\Users\aj>pip freeze | findstr /C:"transformers" /C:"tensorflow" /C:"torch" 
torch @ file:///C:/ci/pytorch_1596373105144/work
transformers==3.0.1 

Important Note: In the above two outputs, where we see a "@ file" based path in place of package version, that package has been installed via 'conda'. 

(e20200909) C:\Users\aj>python -c "import torch; print(torch.__version__);" 
1.6.0


View only Conda installations 
(e20200909) CMD>pip freeze | findstr /C:"file" 
argon2-cffi @ file:///D:/bld/argon2-cffi_1596630042503/work
attrs @ file:///home/conda/feedstock_root/build_artifacts/attrs_1599308529326/work
bleach @ file:///home/conda/feedstock_root/build_artifacts/bleach_1600454382015/work
cffi @ file:///C:/ci/cffi_1598352710791/work
... 

View only Pip installations 
(e20200909) CMD>pip freeze | findstr /v /C:"file" 
async-generator==1.10
backcall==0.2.0
bert-serving-client==1.10.0
bert-serving-server==1.10.0
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
... 

(e20200909) CMD>pip list 
Package             Version
-------------------  -------------------
argon2-cffi          20.1.0
async-generator      1.10
attrs                20.2.0
backcall             0.2.0
bert-serving-client  1.10.0
bert-serving-server  1.10.0
bleach               3.2.1
certifi              2020.6.20
cffi                 1.14.2
chardet              3.0.4
click                7.1.2
... 

==========

6. Conda has more clarity about getting a good match between versions of already installed packages and the new packages that are to be installed:

(e20200909) C:\Users\Ashish Jain>conda install tensorflow -c conda-forge 
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
Examining python=3.8:  50%|███████████████          | 1/2 [00:03<00:03,  3.41s/it]-failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - tensorflow -> python[version='3.5.*|3.6.*|>=3.5,<3.6.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|3.7.*']

Your python: python=3.8

If python is on the left-most side of the chain, that's the version you've asked for. When python appears to the right, that indicates that the thing on the left is somehow not available for the python version you are constrained to. Note that conda will not change your python version to a different minor version unless you explicitly specify that. 

==========

7. Getting all the available versions of a package in PyPI:

(base) CMD>pip install tensorflow== 
ERROR: Could not find a version that satisfies the requirement tensorflow== (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 1.15.0rc0, 1.15.0rc1, 1.15.0rc2, 1.15.0rc3, 1.15.0, 1.15.2, 1.15.3, 1.15.4, 2.0.0a0, 2.0.0b0, 2.0.0b1, 2.0.0rc0, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.1.0rc0, 2.1.0rc1, 2.1.0rc2, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc0, 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.2.1, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0, 2.3.1)
ERROR: No matching distribution found for tensorflow== 

If Conda does not have a package in one of the mentioned channels such as 'conda-forge' or 'defaults', it raises the below exception:

ResolvePackageNotFound:
  - tensorflow=1.15.4 

==========

8. Checking TensorFlow 1.X installation:

(bert_env) CMD>conda list tensorflow 
# packages in environment at E:\programfiles\Anaconda3\envs\bert_env:
#
# Name                    Version                   Build  Channel
tensorflow                1.14.0               h1f41ff6_0    conda-forge
tensorflow-base           1.14.0           py37hc8dfbb8_0    conda-forge
tensorflow-estimator      1.14.0           py37h5ca1d4c_0    conda-forge 

(bert_env) CMD>python 
>>> import tensorflow as tf
>>> print(tf.__version__) 
2.3.0 
>>>       

~ ~ ~ ~ ~

(bert_env) CMD>pip freeze | find "tensorflow" 
tensorflow==2.3.0
tensorflow-estimator==2.3.0 

(bert_env) CMD>pip freeze | findstr "tensorflow" 
tensorflow==2.3.0
tensorflow-estimator==2.3.0 

(bert_env) CMD>python -c "import tensorflow as tf; print(tf.__version__)" 
2.3.0 

(bert_env) CMD>pip show tensorflow 
Name: tensorflow
Version: 2.3.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: c:\users\aj\appdata\roaming\python\python37\site-packages
Requires: wheel, astunparse, numpy, protobuf, wrapt, gast, six, tensorflow-estimator, scipy, grpcio, termcolor, tensorboard, opt-einsum, absl-py, keras-preprocessing, h5py, google-pasta
Required-by: 

==========


Tuesday, October 13, 2020

Flutter for Android Development (Notes, Oct 2020)


Follow these steps as part of setting up an old project on a new system. Such as in case of taking a project from Ubuntu machine to a Windows machine.

1.1 - Go to SDK Manager
1.2 - Install the required Android SDK
2 - Build.grade changes - Removing unnecessary dependencies
3 - Changes in Gradle.properties to use AndroidX and Jetifier
4 - Flutter Commands To Begin With

Extracting Information From Search Engines


1. Google A Google Search URL looks like: https://www.google.com/search?q=nifty+50&start=0 In "nifty+50", "+" indicates [SPACE]. Pagination with 10 results per page, can be queried using 'start' parameter starting from 0. For first 10, start = 0 For second 10, start = 10 Notes: Google throws in a Captcha on sensing access through code or robots: Error: About this page Prove that you are not a robot by solving this Captcha... Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen? IP address: 34.93.163.15 Time: 2020-10-09T09:23:07Z URL: https://www.google.com/search?q=VERB+RIK&start=0 Ref 1: Guide to the Google Search Parameters Ref 2: RapidAPI Google Search (Free usage allows 300 requests per month. '300 requests per month' means you better look for other free alternative.) 2. Bing A Bing Search URL looks like: https://www.bing.com/search?q=nifty+50&first=21 In "nifty+50", "+" indicates [SPACE]. Pagination with 10 results per page, can be queried using 'first' parameter starting from 1. For first 10, first = 1 For second 10, first = 11 Notes: 2.1. The search engine ranks home pages, not blogs. 2.2. Forums are often ranked low in the search results. 2.3. Sends back empty pages with no results on sensing access through code or robots. 2.4. RapidAPI offers an API for Bing Search but it has '1000 / month quota Hard Limit' on free usage and this means you better look for other free alternatives of RapidAPI. 3. Yahoo By Oct 2020, Yahoo has dropped to the third spot in terms of market share. Its web portal is still popular, and it is said to be the eleventh most visited site according to Alexa. Notes: 3.1. Unclear labeling of ads make it hard to distinguish between organic and non-organic results 3.2. Boasts other services such as Yahoo Finance, Yahoo mail, Yahoo answers, and several mobile apps. 3.3. Search URL for Yahoo can formed using its "p" argument as in this URL: https://in.search.yahoo.com/search?p=nifty50 3.4. RapidAPI is not available for Yahoo. 4. Contextual Web Search Contextual Web says on its Home-page: Search APIs & Custom Search: Easy to use search APIs. Search through billions of webpages with a simple API call and without breaking the bank. Get started in minutes. [ Ref: contextualweb.io ] 'Contextual Web Search' has URL: usearch The 'Contextual Web Search' are available via RapidAPI site. It is a paid service with a 'free and limited' version allowing '500 / day quota Hard Limit (After which usage will be restricted)'. Code for RapidAPI: import requests from time import time import datetime import os import json url = "https://contextualwebsearch-websearch-v1.p.rapidapi.com/api/Search/WebSearchAPI" querystring = { "pageNumber" : "1", "q" : "nifty50", "autoCorrect" : "false", "pageSize" : "100" } headers = { 'x-rapidapi-host': "contextualwebsearch-websearch-v1.p.rapidapi.com", 'x-rapidapi-key': "..." } response = requests.request("GET", url, headers=headers, params=querystring) with open(os.path.join("output", querystring['q'] + "_" + str(datetime.datetime.now()).replace(":", "_") + ".json"), mode = 'w', encoding = 'utf-8') as f: f.write(json.dumps(response.json(), indent=2, sort_keys=True)) The above code produces output in the form of a JSON file in a folder alongside this script named "output" (this should be created before running the script). References % RapidAPI's All API(s)

Friday, October 9, 2020

Indian Politicians Supporting Rapist Men


A collation of statements made by Indian politicians from time to time in support of rapists as Uttar Pradesh CM Yogi makes an anti-women comment following Hathras rape case in October, 2020.

Wednesday, October 7, 2020

Word Embeddings using BERT and testing using Word Analogies, Nearest Words, 1D Spectrum


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics.pairwise import cosine_similarity
from joblib import load, dump
import torch
import transformers as ppb

import warnings
warnings.filterwarnings('ignore')

ppb.__version__ 
'3.0.1'

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased') 

Load pretrained model/tokenizer

tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights) 

The above code downloads three files when it runs for the first time:

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri… 

The third file is the model with size 440MB.

Our first step is to tokenize the sentences -- break them up into word and subwords in the format BERT is comfortable with. 

sentences = ['First do it', 'then do it right', 'then do it better']
sentences_df = pd.DataFrame({"sentences": sentences})
tokenized = sentences_df['sentences'].apply((lambda x: tokenizer.encode(x, add_special_tokens=True))) 

Padding
After tokenization, `tokenized` is a list of sentences -- each sentences is represented as a list of tokens. We want BERT to process our examples all at once (as one batch). It's just faster that way. For that reason, we need to pad all lists to the same size, so we can represent the input as one 2-d array, rather than a list of lists (of different lengths).

max_len = 0
for i in tokenized.values:
    if len(i) > max_len:
        max_len = len(i)

padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized.values]) 

Masking 
If we directly send `padded` to BERT, that would slightly confuse it. We need to create another variable to tell it to ignore (mask) the padding we've added when it's processing its input. That's what attention_mask is:

attention_mask = np.where(padded != 0, 1, 0) 

%%time
input_ids = torch.LongTensor(padded)
attention_mask = torch.tensor(attention_mask)

with torch.no_grad():
    last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask) 
	
features = last_hidden_states[0][:,0,:].numpy() 
	
Let's slice only the part of the output that we need. That is the output corresponding the first token of each sentence. The way BERT does sentence classification, is that it adds a token called `[CLS]` (for classification) at the beginning of every sentence. Last token is representing [SEP]. The output corresponding to that token can be thought of as an embedding for the entire sentence.

We'll save those in the `features` variable, as they'll serve as the features to our logitics regression model.

Testing

Word Analogies def get_embedding(in_list): tokenized = [tokenizer.encode(x, add_special_tokens=True) for x in in_list] max_len = 0 for i in tokenized: if len(i) > max_len: max_len = len(i) padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized]) attention_mask = np.where(padded != 0, 1, 0) input_ids = torch.LongTensor(padded) attention_mask = torch.tensor(attention_mask) with torch.no_grad(): last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask) features = last_hidden_states[0][:,0,:].numpy() return features analogies = [['king', 'man', 'queen', 'woman'], ['king', 'prince', 'queen', 'princess'], ['miami', 'florida', 'dallas', 'texas'], ['einstein', 'scientist', 'picasso', 'painter'], ['japan', 'sushi', 'germany', 'bratwurst'], ['man', 'woman', 'he', 'she'], ['man', 'woman', 'uncle', 'aunt'], ['man', 'woman', 'brother', 'sister'], ['man', 'woman', 'husband', 'wife'], ['man', 'woman', 'actor', 'actress'], ['man', 'woman', 'father', 'mother'], ['heir', 'heiress', 'prince', 'princess'], ['nephew', 'niece', 'uncle', 'aunt'], ['france', 'paris', 'japan', 'tokyo'], ['france', 'paris', 'china', 'beijing'], ['february', 'january', 'december', 'november'], ['france', 'paris', 'germany', 'berlin'], ['week', 'day', 'year', 'month'], ['week', 'day', 'hour', 'minute'], ['france', 'paris', 'italy', 'rome'], ['paris', 'france', 'rome', 'italy'], ['france', 'french', 'england', 'english'], ['japan', 'japanese', 'china', 'chinese'], ['china', 'chinese', 'america', 'american'], ['japan', 'japanese', 'italy', 'italian'], ['japan', 'japanese', 'australia', 'australian'], ['walk', 'walking', 'swim', 'swimming']] for i in analogies: king = get_embedding([i[0]]) queen = get_embedding([i[2]]) man = get_embedding([i[1]]) woman = get_embedding([i[3]]) q = king - man + woman print(i[0], '-', i[1], '+', i[3], 'and', i[2], cosine_similarity(queen, q)) king - man + woman and queen [[0.95728725]] king - prince + princess and queen [[0.9805071]] miami - florida + texas and dallas [[0.93608725]] einstein - scientist + painter and picasso [[0.9021458]] japan - sushi + bratwurst and germany [[0.8383053]] man - woman + she and he [[0.97603536]] man - woman + aunt and uncle [[0.9624729]] man - woman + sister and brother [[0.970188]] man - woman + wife and husband [[0.9585104]] man - woman + actress and actor [[0.95233154]] man - woman + mother and father [[0.9783108]] heir - heiress + princess and prince [[0.9558885]] nephew - niece + aunt and uncle [[0.9844531]] france - paris + tokyo and japan [[0.95287836]] france - paris + beijing and china [[0.94868445]] february - january + november and december [[0.89765096]] france - paris + berlin and germany [[0.9586985]] week - day + month and year [[0.9131064]] week - day + minute and hour [[0.9280644]] france - paris + rome and italy [[0.92742187]] paris - france + italy and rome [[0.9252609]] france - french + english and england [[0.9143828]] japan - japanese + chinese and china [[0.9681916]] china - chinese + american and america [[0.9371264]] japan - japanese + italian and italy [[0.97318065]] japan - japanese + australian and australia [[0.96878356]] walk - walking + swimming and swim [[0.90309924]] Nearest Words We have retrieved nouns from the 'BERT Base Uncased' vocabulary. There are 15269 nouns in this vocabulary. You can download "vocab.txt" from here: GitHub We have used to SpaCy to identify the nouns. nouns = load('files_5_p3/list_of_nouns_from_bert_base_uncased_vocab.joblib') %%time noun_embeddings = [get_embedding([i]) for i in nouns] dump(noun_embeddings, 'files_2_p2/list_of_noun_embeddings.joblib') Wall time: 20min 8s noun_embeddings = load('files_2_p2/list_of_noun_embeddings.joblib') noun_embeddings = [n[0] for n in noun_embeddings] from scipy.spatial import distance def get_nn_of_words(in_list): for k in in_list: input_word = k if k not in nouns: continue p = noun_embeddings[nouns.index(input_word)] closest_embedding_indices = distance.cdist(np.array(p).reshape(1, -1), np.array(noun_embeddings).reshape(len(noun_embeddings),-1))[0].argsort()[1:11] closest_nouns = [nouns[i] for i in closest_embedding_indices] print("For", k, closest_nouns) get_nn_of_words(set(pd.core.common.flatten(analogies))) For germany ['austria', 'bavaria', 'berlin', 'luxembourg', 'europe', 'japan', 'britain', 'wurttemberg', 'dresden', 'sweden'] For niece ['nephew', 'granddaughter', 'fiancee', 'daughter', 'grandparents', 'grandson', 'stepmother', 'aunt', 'cousins', 'wife'] For aunt ['grandmother', 'grandfather', 'uncle', 'cousin', 'sister', 'mother', 'miriam', 'vicki', 'uncles', 'cousins'] For february ['january', 'april', 'june', 'november', 'march', 'july', 'august', 'december', 'october', 'spring'] For england ['britain', 'wales', 'australia', 'ireland', 'barbados', 'stoke', 'brentford', 'lancashire', 'cuba', 'luxembourg'] For america ['planet', 'dakota', 'hawaii', 'britain', 'hemisphere', 'coral', 'virginia', 'nina', 'columbia', 'victoria'] For italian ['italy', 'russian', 'catalan', 'portuguese', 'french', 'azerbaijani', 'indonesian', 'austrian', 'japanese', 'irish'] For uncle ['aunt', 'cousin', 'grandfather', 'brother', 'grandmother', 'uncles', 'doc', 'bobby', 'mother', 'kid'] For miami ['tampa', 'seattle', 'vancouver', 'portland', 'arizona', 'vegas', 'sydney', 'florida', 'houston', 'orlando'] For italy ['austria', 'germany', 'luxembourg', 'europe', 'rico', 'japan', 'africa', 'indonesia', 'florence', 'tuscany'] For woman ['teenager', 'girl', 'spouse', 'brother', 'partner', 'daughter', 'mother', 'consort', 'wife', 'stallion'] For english ['afrikaans', 'hindi', 'latin', 'portuguese', 'sanskrit', 'french', 'italian', 'hebrew', 'azerbaijani', 'lithuanian'] For king ['duke', 'prince', 'queen', 'princess', 'throne', 'consort', 'deity', 'queens', 'abbot', 'lords'] For dallas ['jasmine', 'travis', 'savannah', 'eden', 'lucas', 'mia', 'lexi', 'jack', 'hunter', 'penny'] For mother ['grandmother', 'brother', 'mothers', 'parents', 'daughter', 'mom', 'father', 'grandfather', 'sister', 'mary'] For heiress ['landowner', 'heir', 'daughters', 'heirs', 'daughter', 'granddaughter', 'siblings', 'grandson', 'childless', 'clerk'] For japanese ['korean', 'japan', 'thai', 'russian', 'hawaiian', 'malaysian', 'indonesian', 'khmer', 'taiwanese', 'bengali'] For heir ['heirs', 'consort', 'spouse', 'prince', 'womb', 'attendants', 'fulfillment', 'duke', 'daughter', 'keeper'] For january ['november', 'april', 'august', 'february', 'december', 'summer', 'july', 'spring', 'october', 'june'] For brother ['sister', 'grandfather', 'cousin', 'grandmother', 'mother', 'daughter', 'partner', 'bowl', 'mentor', 'beau'] For wife ['husbands', 'daughter', 'spouse', 'husband', 'woman', 'girlfriend', 'household', 'supporter', 'boyfriend', 'granddaughter'] For minute ['moments', 'hour', 'dozen', 'mile', 'cycles', 'millennia', 'moment', 'sizes', 'clocks', 'twenties'] For picasso ['goldsmith', 'michelangelo', 'fresco', 'carousel', 'chopin', 'verdi', 'hercules', 'palette', 'canvas', 'britten'] For week ['month', 'series', 'replacement', 'primetime', 'position', 'highlight', 'zone', 'slot', 'office', 'showcase'] For japan ['america', 'ceylon', 'hawaii', 'malaysia', 'australia', 'taiwan', 'osaka', 'fukuoka', 'indonesia', 'korea'] For einstein ['aristotle', 'nobel', 'beckett', 'wiener', 'relativity', 'abel', 'strauss', 'skinner', 'clifford', 'bernstein'] For australian ['australia', 'canadian', 'canada', 'fremantle', 'oceania', 'america', 'brazil', 'nepal', 'jakarta', 'hawaii'] For painter ['musician', 'painting', 'paintings', 'designer', 'dancer', 'filmmaker', 'illustrator', 'teacher', 'soldier', 'boxer'] For man ['lump', 'woman', 'boss', 'bear', 'scratch', 'intruder', 'alpha', 'rat', 'touch', 'condo'] For florida ['maine', 'louisiana', 'arizona', 'virginia', 'charleston', 'indiana', 'tampa', 'colorado', 'alabama', 'connecticut'] For year ['season', 'month', 'eligibility', 'seasons', 'name', 'calendar', 'date', 'colour', 'highlight', 'divisional'] For tokyo ['osaka', 'kyoto', 'fukuoka', 'nagoya', 'seoul', 'kobe', 'moscow', 'honolulu', 'japan', 'nippon'] For november ['october', 'january', 'december', 'winter', 'spring', 'august', 'april', 'monday', 'halloween', 'wednesday'] For rome ['titan', 'vulcan', 'mesopotamia', 'damascus', 'alexandria', 'egypt', 'baghdad', 'orion', 'denver', 'nevada'] For china ['taiwan', 'fujian', 'indonesia', 'japan', 'asia', 'sichuan', 'malawi', 'lebanon', 'russia', 'zimbabwe'] For hour ['minute', 'hours', 'moments', 'dozen', 'weeks', 'inning', 'day', 'cycles', 'midnight', 'minutes'] For texas ['oregon', 'alabama', 'florida', 'colorado', 'ohio', 'indiana', 'georgia', 'houston', 'arkansas', 'arizona'] For sister ['brother', 'daughter', 'mother', 'grandmother', 'grandfather', 'cousin', 'aunt', 'padre', 'sisters', 'blossom'] For berlin ['vienna', 'stuttgart', 'hannover', 'hamburg', 'bonn', 'dresden', 'dusseldorf', 'gottingen', 'mannheim', 'rosenthal'] For actress ['actor', 'musician', 'singer', 'novelist', 'teacher', 'dancer', 'magician', 'poet', 'painter', 'actors'] For beijing ['tianjin', 'guangzhou', 'singapore', 'honolulu', 'taipei', 'ankara', 'osaka', 'manila', 'durban', 'jakarta'] For princess ['prince', 'madam', 'papa', 'kira', 'sweetie', 'witch', 'ruby', 'wedding', 'tasha', 'marta'] For nephew ['niece', 'grandson', 'granddaughter', 'daughter', 'fiancee', 'girlfriend', 'son', 'brother', 'sidekick', 'wife'] For month ['week', 'summers', 'evening', 'calendar', 'decade', 'semester', 'term', 'position', 'seasonal', 'occasion'] For swimming ['diving', 'weightlifting', 'judo', 'badminton', 'tennis', 'gymnastics', 'archery', 'swimmers', 'breaststroke', 'hockey'] For queen ['princess', 'queens', 'maid', 'king', 'prince', 'duke', 'consort', 'crown', 'stallion', 'madam'] For actor ['actress', 'actors', 'poet', 'television', 'singer', 'novelist', 'comedian', 'musician', 'screenwriter', 'painter'] For december ['november', 'january', 'october', 'april', 'march', 'june', 'july', 'september', 'august', 'autumn'] For american ['british', 'americans', 'america', 'britain', 'african', 'haitian', 'kenyan', 'bangladeshi', 'resident', 'canadian'] For french ['italian', 'portuguese', 'dutch', 'english', 'spanish', 'afrikaans', 'filipino', 'romanian', 'france', 'greek'] For prince ['princess', 'duke', 'consort', 'king', 'benedict', 'commander', 'papa', 'dean', 'throne', 'kevin'] For scientist ['physician', 'archaeologist', 'golfer', 'inventor', 'chef', 'consultant', 'investigator', 'teenager', 'astronaut', 'technician'] For paris ['bonn', 'laval', 'provence', 'dublin', 'geneva', 'eugene', 'michel', 'koln', 'benoit', 'ville'] For father ['mother', 'fathers', 'brother', 'daddy', 'uncles', 'son', 'sister', 'homeland', 'dad', 'protector'] For husband ['wife', 'spouse', 'lover', 'boyfriend', 'husbands', 'woman', 'daughter', 'son', 'fiance', 'mother'] For france ['martinique', 'luxembourg', 'marseille', 'geneva', 'bordeaux', 'lyon', 'paris', 'clermont', 'alsace', 'switzerland'] For australia ['australian', 'america', 'canada', 'tasmania', 'sydney', 'britain', 'japan', 'fremantle', 'malaysia', 'hawaii'] For day ['evening', 'nightfall', 'midnight', 'night', 'dawn', 'morning', 'moments', 'afternoon', 'epoch', 'sunrise'] 1D Spectrum for i in analogies: king = get_embedding([i[0]]) queen = get_embedding([i[2]]) man = get_embedding([i[1]]) woman = get_embedding([i[3]]) q = king - man + woman print(i[0], i[1], i[2], i[3], cosine_similarity(queen, q)) for j in i: print(j, ":") np.random.seed(1) plt.rcParams["figure.figsize"] = 8, 2 x = np.linspace(0, 768, num=768) y = get_embedding([j]) fig, (ax,ax2) = plt.subplots(nrows=2, sharex=True) extent = [x[0]-(x[1]-x[0])/2., x[-1]+(x[1]-x[0])/2.,0,1] ax.imshow(y, cmap="plasma", aspect="auto", extent=extent) ax.set_yticks([]) ax.set_xlim(extent[0], extent[1]) ax2.plot(x, y.ravel()) plt.tight_layout() plt.show()

Tuesday, October 6, 2020

Word embeddings using BERT (Demo of BERT-as-a-Service)


For installation we use a YAML file.
File: D:\f25\files_2\bert_aas.yaml

name: bert_aas
channels:
  - conda-forge
  - defaults
dependencies:
  - termcolor
  - numpy
  - pyzmq
  - tensorflow==1.14.0
  - GPUtil
  - sphinx-argparse
  - pip:
    - bert-serving-server
    - bert-serving-client 

==========

(base) CMD>conda env list
# conda environments:
#
base                  *  D:\programfiles\Anaconda3
e20200909                D:\programfiles\Anaconda3\envs\e20200909
py38                     D:\programfiles\Anaconda3\envs\py38
... 

(base) CMD>jupyter kernelspec list
Available kernels:
  tf         C:\Users\aj\AppData\Roaming\jupyter\kernels\tf
  python3    D:\programfiles\Anaconda3\share\jupyter\kernels\python3
  py38       C:\ProgramData\jupyter\kernels\py38 
  
==========

==> WARNING: A newer version of conda exists.
  current version: 4.8.4
  latest version: 4.8.5

Please update conda by running

    $ conda update -n base -c defaults conda

==========

CMD> conda env create -f bert_aas_1.yaml

(base) CMD>conda activate bert_aas

==========

Checking TensorFlow installation

(bert_aas) CMD>pip freeze | find "tensorflow"
tensorflow @ file:///D:/bld/tensorflow_1594833538462/work/tensorflow-1.14.0-cp37-cp37m-win_amd64.whl
tensorflow-estimator==1.14.0

(bert_aas) CMD>python
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 01:53:57) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
...
D:\programfiles\Anaconda3\envs\bert_aas\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)]) 
>>> tf.__version__
'1.14.0' 
>>>
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2020-10-05 14:47:48.399686: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>>   

==========

Setting up "model_dir" 

This directory is passed as an input argument to "bert-serving-start" program from Command Prompt.

Model can be downloaded from here: storage.googleapis.com: BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters

We also have these options for models (among others):

1. BERT-Large, Uncased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
2. BERT-Large, Cased (Whole Word Masking): 24-layer, 1024-hidden, 16-heads, 340M parameters
3. BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters
4. BERT-Large, Uncased: 24-layer, 1024-hidden, 16-heads, 340M parameters
5. BERT-Base, Cased: 12-layer, 768-hidden, 12-heads , 110M parameters
6. BERT-Large, Cased: 24-layer, 1024-hidden, 16-heads, 340M parameters

Additional Note:

Fine-tuning with BERT 
Important: All results on the paper were fine-tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re-produce most of the BERT-Large results on the paper using a GPU with 12GB - 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out-of-memory issues for more details.

This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google).

The fine-tuning examples which use BERT-Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given.

Ref: bert#pre-trained-models

==========

Issues with newer versions of TensorFlow and Bert-as-a-service 

CMD> bert-serving-start -model_dir E:\e25/files_2/uncased_L-12_H-768_A-12 -num_worker=1

(bert_aas) E:\e25\files_2>bert-serving-start -model_dir E:\e25/files_2/uncased_L-12_H-768_A-12 -num_worker=1

2020-10-04 23:14:02.505211: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-10-04 23:14:02.515231: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
e:\programfiles\anaconda3\envs\bert_aas\lib\site-packages\bert_serving\server\helper.py:176: UserWarning: Tensorflow 2.3.0 is not tested! It may or may not work. Feel free to submit an issue at https://github.com/hanxiao/bert-as-service/issues/
  'Feel free to submit an issue at https://github.com/hanxiao/bert-as-service/issues/' % tf.__version__)
usage: E:\programfiles\Anaconda3\envs\bert_aas\Scripts\bert-serving-start -model_dir E:\e25/files_2/uncased_L-12_H-768_A-12 -num_worker=1
                  ARG   VALUE
__________________________________________________
            ckpt_name = bert_model.ckpt
          config_name = bert_config.json
                cors = *
                  cpu = False
          device_map = []
        do_lower_case = True
  fixed_embed_length = False
                fp16 = False
  gpu_memory_fraction = 0.5
        graph_tmp_dir = None
    http_max_connect = 10
            http_port = None
        mask_cls_sep = False
      max_batch_size = 256
          max_seq_len = 25
            model_dir = E:\e25/files_2/uncased_L-12_H-768_A-12
no_position_embeddings = False
    no_special_token = False
          num_worker = 1
        pooling_layer = [-2]
    pooling_strategy = REDUCE_MEAN
                port = 5555
            port_out = 5556
        prefetch_size = 10
  priority_batch_size = 16
show_tokens_to_client = False
      tuned_model_dir = None
              verbose = False
                  xla = False

I: [35mVENTILATOR [0m:freeze, optimize and export graph, could take a while...

e:\programfiles\anaconda3\envs\bert_aas\lib\site-packages\bert_serving\server\helper.py:176: UserWarning: Tensorflow 2.3.0 is not tested! It may or may not work. Feel free to submit an issue at https://github.com/hanxiao/bert-as-service/issues/
  'Feel free to submit an issue at https://github.com/hanxiao/bert-as-service/issues/' % tf.__version__)
E: [36mGRAPHOPT [0m:fail to optimize the graph!
Traceback (most recent call last):
  File "e:\programfiles\anaconda3\envs\bert_aas\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "e:\programfiles\anaconda3\envs\bert_aas\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "E:\programfiles\Anaconda3\envs\bert_aas\Scripts\bert-serving-start.exe\__main__.py", line 7, in 
  File "e:\programfiles\anaconda3\envs\bert_aas\lib\site-packages\bert_serving\server\cli\__init__.py", line 4, in main
    with BertServer(get_run_args()) as server:
  File "e:\programfiles\anaconda3\envs\bert_aas\lib\site-packages\bert_serving\server\__init__.py", line 71, in __init__
    self.graph_path, self.bert_config = pool.apply(optimize_graph, (self.args,))
TypeError: cannot unpack non-iterable NoneType object 

FROM HELPER.PY:
Path: e:\programfiles\anaconda3\envs\bert_aas\lib\site-packages\bert_serving\server\helper.py

import tensorflow as tf
tf_ver = tf.__version__.split('.')
if int(tf_ver[0]) <= 1 and int(tf_ver[1]) < 10:
    raise ModuleNotFoundError('Tensorflow >=1.10 (one-point-ten) is required!')
elif int(tf_ver[0]) > 1:
    warnings.warn('Tensorflow %s is not tested! It may or may not work. '
                  'Feel free to submit an issue at https://github.com/hanxiao/bert-as-service/issues/' % tf.__version__)
return tf_ver 

~   ~   ~   ~   ~

Removing Conda Environment in Case of Failure
(bert_aas) CMD>conda deactivate
(base) CMD>conda env remove -n bert_aas
Remove all packages in environment E:\programfiles\Anaconda3\envs\bert_aas: y

Checking installation

(bert_aas) CMD>pip freeze | findstr "GPUtil"

GPUtil @ file:///home/conda/feedstock_root/build_artifacts/gputil_1590646865081/work 

(bert_aas) CMD>pip freeze | findstr /C:"tensorflow" /C:"numpy" /C:"GPUtil" 

GPUtil @ file:///home/conda/feedstock_root/build_artifacts/gputil_1590646865081/work
numpy==1.18.5
tensorflow==1.14.0
tensorflow-estimator==1.14.0 

==========

Starting the server:

CMD> bert-serving-start -model_dir D:\ws\jupyter\f25_bert_for_sent_an\files_2\uncased_L-12_H-768_A-12 -num_worker=1

Logs:
                 ARG   VALUE
__________________________________________________
            ckpt_name = bert_model.ckpt
          config_name = bert_config.json
            model_dir = D:\ws\jupyter\f25_bert_for_sent_an\files_2\uncased_L-12_H-768_A-12
...
I: [36mGRAPHOPT [0m:model config: D:\ws\jupyter\f25_bert_for_sent_an\files_2\uncased_L-12_H-768_A-12\bert_config.json
I: [36mGRAPHOPT [0m:checkpoint: D:\ws\jupyter\f25_bert_for_sent_an\files_2\uncased_L-12_H-768_A-12\bert_model.ckpt
I: [36mGRAPHOPT [0m:build graph...
I: [36mGRAPHOPT [0m:load parameters from checkpoint...
I: [36mGRAPHOPT [0m:optimize...
I: [36mGRAPHOPT [0m:freeze...
I: [36mGRAPHOPT [0m:write graph to a tmp file: C:\Users\aj\AppData\Local\Temp\tmpxyxaq_b3
I: [35mVENTILATOR [0m:optimized graph is stored at: C:\Users\aj\AppData\Local\Temp\tmpxyxaq_b3
I: [35mVENTILATOR [0m:bind all sockets
I: [35mVENTILATOR [0m:open 8 ventilator-worker sockets
I: [35mVENTILATOR [0m:start the sink
...
I: [33mWORKER-0 [0m:ready and listening!
I: [35mVENTILATOR [0m:all set, ready to serve request! 

==========

Running the client

(base) C:\Users\aj>conda activate bert_aas

(bert_aas) C:\Users\aj>python

>>> from bert_serving.client import BertClient
>>> bc = BertClient()
>>> enc_values = bc.encode(['First do it', 'then do it right', 'then do it better'])
>>> enc_values.shape
(3, 768)
>>> enc_values
array([[ 0.13186528,  0.3240411 , -0.82704353, ..., -0.37119573,
        -0.39250126, -0.31721842],
        [ 0.24873482, -0.12334437, -0.38933888, ..., -0.4475625 ,
        -0.55913603, -0.11345193],
        [ 0.2862734 , -0.18580128, -0.3090687 , ..., -0.29593647,
        -0.39310572,  0.0764024 ]], dtype=float32)
>>> 

Sunday, October 4, 2020

Set up Google OAuth 2.0 Authentication For Blogger and Retrieve Blog Posts


Installation 
Blogger APIs Client Library for Python
CMD> pip install --upgrade google-api-python-client 
Ref: developers.google.com

(base) C:\Users\ashish\Desktop>pip install oauth2client

(base) C:\Users\ashish\Desktop\blogger>pip show google-api-python-client
Name: google-api-python-client
Version: 1.12.3
Summary: Google API Client Library for Python
Home-page: https://github.com/googleapis/google-api-python-client/
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: d:\programfiles\anaconda3\lib\site-packages
Requires: six, google-auth-httplib2, google-auth, google-api-core, uritemplate, httplib2
Required-by: 

(base) C:\Users\ashish\Desktop\blogger>pip show oauth2client
Name: oauth2client
Version: 4.1.3
Summary: OAuth 2.0 client library
Home-page: http://github.com/google/oauth2client/
Author: Google Inc.
Author-email: jonwayne+oauth2client@google.com
License: Apache 2.0
Location: d:\programfiles\anaconda3\lib\site-packages
Requires: pyasn1, six, rsa, pyasn1-modules, httplib2
Required-by: 

==  ==  ==  ==  ==

Basic structure of the "client_secrets.json" is as written below:

Ref: GitHub

{
  "web": {
    "client_id": "[[INSERT CLIENT ID HERE]]",
    "client_secret": "[[INSERT CLIENT SECRET HERE]]",
    "redirect_uris": [],
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://accounts.google.com/o/oauth2/token"
  }
} 

Our "client_secrets.json" file looks like this:

{
  "installed":
  {
    "client_id": "0...9-a...z.apps.googleusercontent.com",
    "project_id": "banded...",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "2.....v",
    "redirect_uris": ["urn:ietf:wg:oauth:2.0:oob", "http://localhost"]
  }
} 

==  ==  ==  ==  ==

About authorization protocols 
Your application must use OAuth 2.0 to authorize requests. No other authorization protocols are supported. If your application uses Google Sign-In, some aspects of authorization are handled for you.

You can create "OAuth Client ID" from here: developers.google.com

You can also create a new "Project". We will select already created "My Project". We will select the "OAuth Client" as "Desktop App":
== == == == == Error if the 'client_secrets.json' file is not present alongside "blogger.py": (base) C:\Users\ashish\Desktop>python blogger.py The client secrets were invalid: ('Error opening file', 'client_secrets.json', 'No such file or directory', 2) WARNING: Please configure OAuth 2.0 To make this sample run you will need to populate the client_secrets.json file found at: client_secrets.json with information from the APIs Console [ https://code.google.com/apis/console ]. == == == == == Error on placing "empty (0KB)" file named "client_secrets.json": (base) C:\Users\ashish\Desktop\blogger>python blogger_v2.py Traceback (most recent call last): File "blogger_v2.py", line 43, in [module] main(sys.argv) File "blogger_v2.py", line 12, in main scope='https://www.googleapis.com/auth/blogger') File "D:\programfiles\Anaconda3\lib\site-packages\googleapiclient\sample_tools.py", line 88, in init client_secrets, scope=scope, message=tools.message_if_missing(client_secrets) File "D:\programfiles\Anaconda3\lib\site-packages\oauth2client\_helpers.py", line 133, in positional_wrapper return wrapped(*args, **kwargs) File "D:\programfiles\Anaconda3\lib\site-packages\oauth2client\client.py", line 2135, in flow_from_clientsecrets cache=cache) File "D:\programfiles\Anaconda3\lib\site-packages\oauth2client\clientsecrets.py", line 165, in loadfile return _loadfile(filename) File "D:\programfiles\Anaconda3\lib\site-packages\oauth2client\clientsecrets.py", line 122, in _loadfile obj = json.load(fp) File "D:\programfiles\Anaconda3\lib\json\__init__.py", line 296, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "D:\programfiles\Anaconda3\lib\json\__init__.py", line 348, in loads return _default_decoder.decode(s) File "D:\programfiles\Anaconda3\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "D:\programfiles\Anaconda3\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) == == == == == Before authentication: C:\Users\ashish\Desktop\blogger>tree /f C:. blogger_v1.py client_secrets.json No subfolders exist After authentication (the authentication steps are shown below), a 'blogger.dat' file is created: C:\Users\ashish\Desktop\blogger>tree /f C:. blogger.dat blogger_v1.py client_secrets.json No subfolders exist == == == == == When we run the application the application the first time, it opens the "Google's User Authentication" window to give access to the application to the Blogger for a Google account by logging into the respective Google Account.
Last prompt will ask you to confirm your choice:
Once the sign-in is complete it opens a Broswer Window (our browser is Chrome) as shown below:
Message: Application flow has been completed == == == == == CODE V1 #!/usr/bin/env python # -*- coding: utf-8 -*- """Simple command-line sample for Blogger. Command-line application that retrieves the users blogs and posts. Usage: $ python blogger.py You can also get help on all the command-line flags the program understands by running: $ python blogger.py --help To get detailed log output run: $ python blogger.py --logging_level=DEBUG """ from __future__ import print_function import sys from oauth2client import client from googleapiclient import sample_tools def main(argv): # Authenticate and construct service. service, flags = sample_tools.init( argv, 'blogger', 'v3', __doc__, __file__, scope='https://www.googleapis.com/auth/blogger') try: users = service.users() # Retrieve this user's profile information thisuser = users.get(userId='self').execute() print('This user\'s display name is: %s' % thisuser['displayName']) blogs = service.blogs() # Retrieve the list of Blogs this user has write privileges on thisusersblogs = blogs.listByUser(userId='self').execute() for blog in thisusersblogs['items']: print('The blog named \'%s\' is at: %s' % (blog['name'], blog['url'])) posts = service.posts() # List the posts for each blog this user has for blog in thisusersblogs['items']: print('The posts for %s:' % blog['name']) request = posts.list(blogId=blog['id']) while request != None: posts_doc = request.execute() if 'items' in posts_doc and not (posts_doc['items'] is None): for post in posts_doc['items']: print(' %s (%s)' % (post['title'], post['url'])) request = posts.list_next(request, posts_doc) except client.AccessTokenRefreshError: print ('The credentials have been revoked or expired, please re-run' 'the application to re-authorize') if __name__ == '__main__': main(sys.argv) == == == == == (base) CMD>python blogger_v1.py D:\programfiles\Anaconda3\lib\site-packages\oauth2client\_helpers.py:255: UserWarning: Cannot access blogger.dat: No such file or directory warnings.warn(_MISSING_FILE_MESSAGE.format(filename)) Your browser has been opened to visit: https://accounts.google.com/o/oauth2/auth?client_id=1... If your browser is on a different machine then exit and re-run this application with the command-line parameter --noauth_local_webserver Authentication successful. This user's display name is: Ashish Jain The blog named 'survival8' is at: http://survival8.blogspot.com/ The blog named 'ashish' is at: http://ashish.blogspot.com/ The posts for survival8: Difficult Conversations. How to Discuss What Matters Most (D. Stone, B. Patton, S. Heen, 2010) (http://survival8.blogspot.com/2020/09/douglas-stone-b-patton-s-heen-difficult.html) ... Is having a water fountain a sensible thing to do? (http://survival8.blogspot.com/2016/11/is-having-water-fountain-sensible-thing_21.html) The posts for ashish: ... == == == == == 'Cloud Resource Manager' Screenshot Showing Our Usage
== == == == == References GitHub % Blogger: google-api-python-client % blogger.py Google API Console % Google API Console % Dashboard % Cloud Resource Manager Documentation % Blogger APIs Client Library for Python % Blogger API: Using the API % Blogger API v3 (Instance Methods) % Blogger API v3 . posts (Instance Methods)

Getting started with word embedding technique 'BERT'



BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks.

Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: arxiv: BERT.

Ref: bert#pre-trained-models

URL to code for this post: GitHub: bert-as-service

STEP 1:
Installation of BERT as a server:

pip install bert-serving-server  # server
pip install bert-serving-client  # client, independent of `bert-serving-server` 

STEP 2:
Download a PRE-Trained BERT model.
The one we have using is "BERT-Base, Uncased	12-layer, 768-hidden, 12-heads, 110M parameters":

storage.googleapis.com: uncased_L-12_H-768_A-12.zip

STEP 3:
After installation of the server, start it in Python shell as follows:
$ bert-serving-start -model_dir D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12 -num_worker=1

LOGS:

(env_for_python_36) C:\Users\ashish>bert-serving-start -model_dir D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12 -num_worker=1

usage: C:\Users\ashish\AppData\Local\Continuum\anaconda3\envs\env_for_python_36\Scripts\bert-serving-start -model_dir D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12 -num_worker=1
                 ARG   VALUE
__________________________________________________
           ckpt_name = bert_model.ckpt
         config_name = bert_config.json
                cors = *
                 cpu = False
          device_map = []
       do_lower_case = True
  fixed_embed_length = False
                fp16 = False
 gpu_memory_fraction = 0.5
       graph_tmp_dir = None
    http_max_connect = 10
           http_port = None
        mask_cls_sep = False
      max_batch_size = 256
         max_seq_len = 25
           model_dir = D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12
no_position_embeddings = False
    no_special_token = False
          num_worker = 1
       pooling_layer = [-2]
    pooling_strategy = REDUCE_MEAN
                port = 5555
            port_out = 5556
       prefetch_size = 10
 priority_batch_size = 16
show_tokens_to_client = False
     tuned_model_dir = None
             verbose = False
                 xla = False

I:[35mVENTILATOR[0m:freeze, optimize and export graph, could take a while...
I:[36mGRAPHOPT[0m:model config: D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12\bert_config.json
I:[36mGRAPHOPT[0m:checkpoint: D:\workspace\Jupyter\exp_42_bert\uncased_L-12_H-768_A-12\bert_model.ckpt
I:[36mGRAPHOPT[0m:build graph...
I:[36mGRAPHOPT[0m:load parameters from checkpoint...
I:[36mGRAPHOPT[0m:optimize...
I:[36mGRAPHOPT[0m:freeze...
I:[36mGRAPHOPT[0m:write graph to a tmp file: C:\Users\ashish\AppData\Local\Temp\tmpy8lsjd5y
I:[35mVENTILATOR[0m:optimized graph is stored at: C:\Users\ashish\AppData\Local\Temp\tmpy8lsjd5y
I:[35mVENTILATOR[0m:bind all sockets
I:[35mVENTILATOR[0m:open 8 ventilator-worker sockets
I:[35mVENTILATOR[0m:start the sink
I:[32mSINK[0m:ready
I:[35mVENTILATOR[0m:get devices
W:[35mVENTILATOR[0m:no GPU available, fall back to CPU
I:[35mVENTILATOR[0m:device map:
                worker  0 -> cpu
I:[33mWORKER-0[0m:use device cpu, load graph from C:\Users\ashish\AppData\Local\Temp\tmpy8lsjd5y
I:[33mWORKER-0[0m:ready and listening!
I:[35mVENTILATOR[0m:all set, ready to serve request! 

STEP 4:
Then to use the client in a different console:

from bert_serving.client import BertClient
bc = BertClient()
bc.encode(['First do it', 'then do it right', 'then do it better']) 

Logs:
(env_for_python_36) C:\Users\ashish>python
Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from bert_serving.client import BertClient
>>> bc = BertClient()
>>> bc.encode(['First do it', 'then do it right', 'then do it better'])
array([[ 0.13186511,  0.32404116, -0.8270434 , ..., -0.37119645,
        -0.39250118, -0.3172187 ],
       [ 0.24873514, -0.12334443, -0.38933924, ..., -0.4475621 ,
        -0.559136  , -0.1134515 ],
       [ 0.28627324, -0.18580206, -0.30906808, ..., -0.2959365 ,
        -0.39310536,  0.07640218]], dtype=float32)
>>> 
Dated: 19-Dec-2019