survival8: BERT

Showing posts with label BERT. Show all posts

Friday, April 5, 2024

Building Zero Shot Classifiers For Text Using Large Language Models

The software tools we need for the activity covered in this post are:

Google Colab
GitHub
And last: ChatGPT

Why I needed these three items?

I needed Google Colab to write code. Google Colab allowed to me not care about creating a local environment and setting it up with the required packages such as 'transformers' from Hugging Face.

I needed GitHub to put my code in a place that I can make available to myself anywhere and also to you (my readers).

I needed ChatGPT to get boilerplate code for our particular task. I learnt about the prompts I needed for this activity from the book by Sinan Ozdemir titled:

Quick Start Guide to Large Language Models. Strategies and Best Practices for using ChatGPT and Other LLMs - Addison-Wesley Professional (2023)

Download Book

What you would need would (I think) is:

Google Colab that is connected for code to my public repository hosted on GitHub.

How to connect Google Colab with GitHub?

1. Once you would open the Google Colab, you get a screen as shown below.

Note that: I am logged in to my Google Account for this task.

2. Next: you would click on "File" at the top left.

You would click on "Open Notebook".

And you would select "GitHub" as shown below:

3. You would fill in the username. It is: ashishjain1547

Once you fill in the username, the "Repository" dropdown would auto populate with public repositories available for that user.

The repository you would select is: "generative_ai_workspace_2024_04_05"

4. Once repo is selected, it's notebooks start appearing below:

Code for zero shot Spam vs. Not Spam classifier using Facebook's BART

from transformers import pipeline
def classify_text (email):
    """
    Use Facebook's BART model to classify an email into "spam" or "not spam"
    Args:
    email (str): The email to classify
    Returns:
    str: The classification of the email
    """

    classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
    labels = ['spam', 'not spam']
    hypothesis_template = 'This email is {}.'
    results = classifier(email, labels, hypothesis_template=hypothesis_template)
    return results['labels'][0]

Usage:

How we used ChatGPT?

Provide code to do zero shot sentiment analysis using Google BERT.

Answer:

To perform zero-shot sentiment analysis using Google's BERT model, you can utilize the Hugging Face Transformers library along with the pre-trained BERT model. Here's how you can do it in Python:

from transformers import pipeline

# Load the zero-shot classification pipeline with BERT
classifier = pipeline("zero-shot-classification", model="bert-base-uncased")

# Define a list of texts for sentiment analysis
texts = [
    "I love this product, it's amazing!",
    "This movie was terrible, I hated it.",
    "The weather today is fantastic.",
    "I feel neutral about this situation."
]

# Define the possible labels/categories for sentiment
candidate_labels = ["positive", "negative", "neutral"]

# Perform zero-shot sentiment analysis
results = classifier(texts, candidate_labels)

# Print the results
for i, text in enumerate(texts):
    print(f"Text: {text}")
    print("Sentiment probabilities:")
    for label, score in zip(results[i]['labels'], results[i]['scores']):
        print(f"{label}: {score:.4f}")
    print()

Output

Text: I love this product, it's amazing!
Sentiment probabilities:
positive: 0.4718
negative: 0.2679
neutral: 0.2603

Text: This movie was terrible, I hated it.
Sentiment probabilities:
negative: 0.3644
positive: 0.3179
neutral: 0.3177

Text: The weather today is fantastic.
Sentiment probabilities:
positive: 0.4026
negative: 0.3039
neutral: 0.2935

Text: I feel neutral about this situation.
Sentiment probabilities:
neutral: 0.3587
negative: 0.3248
positive: 0.3166

Additional Note

Question:

How does the BERT know that the problem is of sentiment analysis and not spam-vs-not_spam?

Answer:

In the code provided, the BERT model doesn't inherently "know" that the problem is specifically about sentiment analysis. Instead, it is designed for zero-shot classification, which means it can classify texts into predefined categories without being explicitly trained on those categories.

The key lies in the input data and the labels provided. In the example, we've defined a list of texts that contain statements or reviews, and we've also provided a list of candidate labels (positive, negative, neutral) that represent sentiment categories. When you pass these texts and labels to the BERT model through the zero-shot classification pipeline, it infers the sentiment of each text based on its understanding of the language and the provided labels.

If you were to perform a different classification task, such as spam vs. not spam, you would simply replace the list of candidate labels with the appropriate labels for that task (e.g., spam, not_spam). The model doesn't need to be explicitly told the nature of the classification problem because it learns to make predictions based on the provided labels during training.

Thursday, September 22, 2022

Sentiment Analysis using BERT, DistilBERT and ALBERT (Installation)

We will do Sentiment Analysis using the code from this repo: GitHub
Note: The entire GitHub code base for this project is about 18 MB in size.
And for the first time, when you run the "server.py" from Anaconda Prompt, it downloads the BERT model of size about 450 MB.

Contents of YAML file for conda environment creation: env.yml

name: barissayil
channels:
  - defaults
  - conda-forge
  - pytorch
  
dependencies:
  - python==3.9
  - pip
  - pip:
    - transformers==4.15.0
  - pytorch
  - pandas
  - numpy
  - flask
  - flask_cors
  - scikit-learn 

Running the command in Ubuntu Terminal
(base) ashish@ashish-Lenovo-ideapad-130-15IKB:~/Desktop$ conda env create -f env.yml


Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 4.14.0

Please update conda by running

    $ conda update -n base -c defaults conda

Downloading and Extracting Packages
numpy-base-1.23.1    | 5.6 MB    | ### | 100% 
pytz-2022.1          | 194 KB    | ### | 100% 
tzdata-2022c         | 107 KB    | ### | 100% 
numexpr-2.8.3        | 124 KB    | ### | 100% 
ninja-1.10.2         | 8 KB      | ### | 100% 
flask_cors-3.0.10    | 16 KB     | ### | 100% 
certifi-2022.9.14    | 155 KB    | ### | 100% 
libgcc-ng-11.2.0     | 5.3 MB    | ### | 100% 
scipy-1.7.1          | 16.9 MB   | ### | 100% 
setuptools-63.4.1    | 1.1 MB    | ### | 100% 
libgomp-11.2.0       | 474 KB    | ### | 100% 
numpy-1.23.1         | 11 KB     | ### | 100% 
pip-22.1.2           | 2.5 MB    | ### | 100% 
flask-2.1.3          | 130 KB    | ### | 100% 
pandas-1.4.4         | 9.8 MB    | ### | 100% 
bottleneck-1.3.5     | 115 KB    | ########### | 100% 
scikit-learn-1.1.1   | 6.1 MB    | ########### | 100% 
ninja-base-1.10.2    | 109 KB    | ########### | 100% 
python-3.9.0         | 18.1 MB   | ########### | 100% 
pyparsing-3.0.9      | 151 KB    | ########### | 100% 
typing_extensions-4. | 42 KB     | ########### | 100% 
typing-extensions-4. | 9 KB      | ########### | 100% 
_openmp_mutex-5.1    | 21 KB     | ########### | 100% 
zipp-3.8.0           | 15 KB     | ########### | 100% 
pytorch-1.10.2       | 44.1 MB   | ########### | 100% 
markupsafe-2.1.1     | 21 KB     | ########### | 100% 



cffi-1.15.1          | 228 KB    | ########### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

    Installed package of scikit-learn can be accelerated using scikit-learn-intelex.
    More details are available here: https://intel.github.io/scikit-learn-intelex

    For example:

        $ conda install scikit-learn-intelex
        $ python -m sklearnex my_application.py

Installing pip dependencies: - Ran pip subprocess with arguments:
['/home/ashish/anaconda3/envs/barissayil/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt']
Pip subprocess output:
Collecting transformers==4.15.0
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 194.6 kB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (1.23.1)
Collecting regex!=2019.12.17
  Downloading regex-2022.9.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (769 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 770.0/770.0 kB 169.9 kB/s eta 0:00:00
Collecting requests
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 178.2 kB/s eta 0:00:00
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.7/120.7 kB 128.6 kB/s eta 0:00:00
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 661.8/661.8 kB 192.7 kB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (21.3)
Collecting filelock
  Downloading filelock-3.8.0-py3-none-any.whl (10 kB)

  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 168.1 kB/s eta 0:00:00
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 880.6/880.6 kB 144.4 kB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 162.9 kB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (4.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from packaging>=20.0->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (3.0.9)
Collecting idna<4,>=2.5
  Downloading idna-3.4-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.5/61.5 kB 139.1 kB/s eta 0:00:00
Collecting charset-normalizer<3,>=2
  Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Requirement already satisfied: certifi>=2017.4.17 in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from requests->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (2022.9.14)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.12-py2.py3-none-any.whl (140 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.4/140.4 kB 188.4 kB/s eta 0:00:00
Requirement already satisfied: six in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from sacremoses->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (1.16.0)
Requirement already satisfied: click in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from sacremoses->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (8.0.4)
Requirement already satisfied: joblib in /home/ashish/anaconda3/envs/barissayil/lib/python3.9/site-packages (from sacremoses->transformers==4.15.0->-r /home/ashish/Desktop/condaenv.6jfaxui9.requirements.txt (line 1)) (1.1.0)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py): started
  Building wheel for sacremoses (setup.py): finished with status 'done'
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=3aa00715128a0c0de964dd1229c0cd2704c6ddb45ef5407c4bb3e5d273808164
  Stored in directory: /home/ashish/.cache/pip/wheels/12/1c/3d/46cf06718d63a32ff798a89594b61e7f345ab6b36d909ce033
Successfully built sacremoses
Installing collected packages: tokenizers, urllib3, tqdm, regex, pyyaml, idna, filelock, charset-normalizer, sacremoses, requests, huggingface-hub, transformers
Successfully installed charset-normalizer-2.1.1 filelock-3.8.0 huggingface-hub-0.9.1 idna-3.4 pyyaml-6.0 regex-2022.9.13 requests-2.28.1 sacremoses-0.0.53 tokenizers-0.10.3 tqdm-4.64.1 transformers-4.15.0 urllib3-1.26.12

#
# To activate this environment, use
#
#     $ conda activate barissayil
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Tuesday, May 31, 2022

RuntimeError for input token sequence longer than 512 tokens for BERT

import transformers as ppb
import torch
import numpy as np
print(ppb.__version__)

4.19.2

input_sentence_1 = "In recent years, a lot of hype has developed around the promise of neural networks and their ability to classify and identify input data, and more recently the ability of certain network architectures to generate original content. Companies large and small are using them for everything from image captioning and self-driving car navigation to identifying solar panels from satellite images and recognizing faces in security camera videos. And luckily for us, many NLP applications of neural nets exist as well. While deep neural networks have inspired a lot of hype and hyperbole, our robot overlords are probably further off than any clickbait cares to admit. Neural networks are, however, quite powerful tools, and you can easily use them in an NLP chatbot pipeline to classify input text, summarize documents, and even generate novel works. This chapter is intended as a primer for those with no experience in neural networks. We don’t cover anything specific to NLP in this chapter, but gaining a basic understanding of what is going on under the hood in a neural network is important for the upcoming chapters. If you’re familiar with the basics of a neural network, you can rest easy in skipping ahead to the next chapter, where you dive back into processing text with the various flavors of neural nets. Although the mathematics of the underlying algorithm, backpropagation, are outside this book’s scope, a high-level grasp of its basic functionality will help you understand language and the patterns hidden within. As the availability of processing power and memory has exploded over the course of the decade, an old technology has come into its own again. First proposed in the 1950s by Frank Rosenblatt, the perceptron1 offered a novel algorithm for finding patterns in data. The basic concept lies in a rough mimicry of the operation of a living neuron cell. As electrical signals flow into the cell through the dendrites (see figure 5.1) into the nucleus, an electric charge begins to build up. When the cell reaches a certain level of charge, it fires, sending an electrical signal out through the axon. However, the dendrites aren’t all created equal. The cell is more “sensitive” to signals through certain dendrites than others, so it takes less of a signal in those paths to fire the axon."
print(input_sentence_1)
print("Char count", len(input_sentence_1))
print("Word Count:", len(input_sentence_1.split(" ")))

In recent years, a lot of hype has developed around the promise of neural networks and their ability to classify and identify input data, and more recently the ability of certain network architectures to generate original content. Companies large and small are using them for everything from image captioning and self-driving car navigation to identifying solar panels from satellite images and recognizing faces in security camera videos. And luckily for us, many NLP applications of neural nets exist as well. While deep neural networks have inspired a lot of hype and hyperbole, our robot overlords are probably further off than any clickbait cares to admit. Neural networks are, however, quite powerful tools, and you can easily use them in an NLP chatbot pipeline to classify input text, summarize documents, and even generate novel works. This chapter is intended as a primer for those with no experience in neural networks. We don’t cover anything specific to NLP in this chapter, but gaining a basic understanding of what is going on under the hood in a neural network is important for the upcoming chapters. If you’re familiar with the basics of a neural network, you can rest easy in skipping ahead to the next chapter, where you dive back into processing text with the various flavors of neural nets. Although the mathematics of the underlying algorithm, backpropagation, are outside this book’s scope, a high-level grasp of its basic functionality will help you understand language and the patterns hidden within. As the availability of processing power and memory has exploded over the course of the decade, an old technology has come into its own again. First proposed in the 1950s by Frank Rosenblatt, the perceptron1 offered a novel algorithm for finding patterns in data. The basic concept lies in a rough mimicry of the operation of a living neuron cell. As electrical signals flow into the cell through the dendrites (see figure 5.1) into the nucleus, an electric charge begins to build up. When the cell reaches a certain level of charge, it fires, sending an electrical signal out through the axon. However, the dendrites aren’t all created equal. The cell is more “sensitive” to signals through certain dendrites than others, so it takes less of a signal in those paths to fire the axon.
Char count 2309
Word Count: 382

input_sentence_2 = "The biology that controls these relationships is most certainly beyond the scope of this book, but the key concept to notice here is the way the cell weights incoming signals when deciding when to fire. The neuron will dynamically change those weights in the decision making process over the course of its life. You are going to mimic that process. Rosenblatt’s original project was to teach a machine to recognize images. The original perceptron was a conglomeration of photo-receptors and potentiometers, not a computer in the current sense. But implementation specifics aside, Rosenblatt’s concept was to take the features of an image and assign a weight, a measure of importance, to each one. The features of the input image were each a small subsection of the image. A grid of photo-receptors would be exposed to the image. Each receptor would see one small piece of the image. The brightness of the image that a particular photoreceptor could see would determine the strength of the signal that it would send to the associated “dendrite.” Each dendrite had an associated weight in the form of a potentiometer. Once enough signal came in, it would pass the signal into the main body of the “nucleus” of the “cell.” Once enough of those signals from all the potentiometers passed a certain threshold, the perceptron would fire down its axon, indicating a positive match on the image it was presented with. If it didn’t fire for a given image, that was a negative classification match. Think “hot dog, not hot dog” or “iris setosa, not iris setosa.” So far there has been a lot of hand waving about biology and electric current and photo-receptors. Let’s pause for a second and peel out the most important parts of this concept. Basically, you’d like to take an example from a dataset, show it to an algorithm, and have the algorithm say yes or no. That’s all you’re doing so far. The first piece you need is a way to determine the features of the sample. Choosing appropriate features turns out to be a surprisingly challenging part of machine learning. In “normal” machine learning problems, like predicting home prices, your features might be square footage, last sold price, and ZIP code. Or perhaps you’d like to predict the species of a certain flower using the Iris dataset.2 In that case your features would be petal length, petal width, sepal length, and sepal width. In Rosenblatt’s experiment, the features were the intensity values of each pixel (subsections of the image), one pixel per photo receptor."
print(input_sentence_2)
print("Char count", len(input_sentence_2))
print("Word Count:", len(input_sentence_2.split(" ")))

The biology that controls these relationships is most certainly beyond the scope of this book, but the key concept to notice here is the way the cell weights incoming signals when deciding when to fire. The neuron will dynamically change those weights in the decision making process over the course of its life. You are going to mimic that process. Rosenblatt’s original project was to teach a machine to recognize images. The original perceptron was a conglomeration of photo-receptors and potentiometers, not a computer in the current sense. But implementation specifics aside, Rosenblatt’s concept was to take the features of an image and assign a weight, a measure of importance, to each one. The features of the input image were each a small subsection of the image. A grid of photo-receptors would be exposed to the image. Each receptor would see one small piece of the image. The brightness of the image that a particular photoreceptor could see would determine the strength of the signal that it would send to the associated “dendrite.” Each dendrite had an associated weight in the form of a potentiometer. Once enough signal came in, it would pass the signal into the main body of the “nucleus” of the “cell.” Once enough of those signals from all the potentiometers passed a certain threshold, the perceptron would fire down its axon, indicating a positive match on the image it was presented with. If it didn’t fire for a given image, that was a negative classification match. Think “hot dog, not hot dog” or “iris setosa, not iris setosa.” So far there has been a lot of hand waving about biology and electric current and photo-receptors. Let’s pause for a second and peel out the most important parts of this concept. Basically, you’d like to take an example from a dataset, show it to an algorithm, and have the algorithm say yes or no. That’s all you’re doing so far. The first piece you need is a way to determine the features of the sample. Choosing appropriate features turns out to be a surprisingly challenging part of machine learning. In “normal” machine learning problems, like predicting home prices, your features might be square footage, last sold price, and ZIP code. Or perhaps you’d like to predict the species of a certain flower using the Iris dataset.2 In that case your features would be petal length, petal width, sepal length, and sepal width. In Rosenblatt’s experiment, the features were the intensity values of each pixel (subsections of the image), one pixel per photo receptor.
Char count 2518
Word Count: 426

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

def get_embedding(in_list):
tokenized = [tokenizer.encode(x, add_special_tokens=True) for x in in_list]

max_len = 0
for i in tokenized:
if len(i) > max_len:
max_len = len(i)

padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized])

attention_mask = np.where(padded != 0, 1, 0)

input_ids = torch.LongTensor(padded)
attention_mask = torch.tensor(attention_mask)

with torch.no_grad():
last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask)

features = last_hidden_states[0][:,0,:].numpy()
return features

string_embeddings = get_embedding([input_sentence_1, input_sentence_2])

Token indices sequence length is longer than the specified maximum sequence length for this model (560 > 512). Running this sequence through the model will result in indexing errors

---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 string_embeddings = get_embedding([input_sentence_1, input_sentence_2])

Input In [11], in get_embedding(in_list)
14 attention_mask = torch.tensor(attention_mask)
16 with torch.no_grad():
---> 17 last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask)
19 features = last_hidden_states[0][:,0,:].numpy()
20 return features

File E:\programfiles\Anaconda3\envs\transformers\lib\site-packages\torch\nn\modules\module.py:1102, in Module._call_impl(self, *input, **kwargs)
1098 # If we don't have any hooks, we want to skip the rest of the logic in
1099 # this function, and just call forward.
1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1101 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102 return forward_call(*input, **kwargs)
1103 # Do not call functions when jit is used
1104 full_backward_hooks, non_full_backward_hooks = [], []

File E:\programfiles\Anaconda3\envs\transformers\lib\site-packages\transformers\models\bert\modeling_bert.py:983, in BertModel.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
981 if hasattr(self.embeddings, "token_type_ids"):
982 buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
--> 983 buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
984 token_type_ids = buffered_token_type_ids_expanded
985 else:

RuntimeError: The expanded size of the tensor (560) must match the existing size (512) at non-singleton dimension 1. Target sizes: [2, 560]. Tensor sizes: [1, 512]

Thursday, October 15, 2020

BERT is aware of the context of a word in a sentence

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics.pairwise import cosine_similarity
import torch
import transformers as ppb
import warnings
warnings.filterwarnings('ignore')

from joblib import load, dump
import json
import re 

print(ppb.__version__)

'3.0.1'

Loading the Pre-trained BERT model

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased')

# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights) 

When run first time, the above statements loads a model of 440MB in size.

Word Ambiguities 

def get_embedding(in_list):
    tokenized = [tokenizer.encode(x, add_special_tokens=True) for x in in_list]
    
    max_len = 0
    for i in tokenized:
        if len(i) > max_len:
            max_len = len(i)

    padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized])
    
    attention_mask = np.where(padded != 0, 1, 0)
    
    input_ids = torch.LongTensor(padded)
    attention_mask = torch.tensor(attention_mask)

    with torch.no_grad():
        last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask)
        
    features = last_hidden_states[0][:,0,:].numpy()
    return features 

python_strings = [
    'I love coding in Python language.',
    'Python is more readable than Java.',
    'Pythons are famous for their very long body.',
    'Python is famous for its very long body.',
    'All six continents have a python species.',
    'Python is a programming language.',
    'Python is a reptile.',
    'The python ate a mouse.',
    'python ate a mouse'
]

string_embeddings = get_embedding(python_strings)   

print(string_embeddings.shape) 

(9, 768)

csm = cosine_similarity(X = string_embeddings, Y=None, dense_output=True)
print(csm.round(2)) 

In the picture below, if we ignore the diagnol (that is similarity of a sentence to itself), we are able to see which sentence is closer to which.


[[1.   0.83 0.8  0.79 0.8  0.84 0.84 0.81 0.81]
 [0.83 1.   0.79 0.76 0.8  0.87 0.79 0.8  0.79]
 [0.8  0.79 1.   0.96 0.86 0.77 0.88 0.77 0.78]
 [0.79 0.76 0.96 1.   0.82 0.77 0.9  0.75 0.77]
 [0.8  0.8  0.86 0.82 1.   0.78 0.85 0.8  0.8 ]
 [0.84 0.87 0.77 0.77 0.78 1.   0.81 0.76 0.78]
 [0.84 0.79 0.88 0.9  0.85 0.81 1.   0.81 0.86]
 [0.81 0.8  0.77 0.75 0.8  0.76 0.81 1.   0.9 ]
 [0.81 0.79 0.78 0.77 0.8  0.78 0.86 0.9  1.  ]] 

for i in range(len(csm)):
    ord_indx = np.argsort(csm[i])[::-1]
    print(python_strings[ord_indx[0]])
    print([python_strings[j] for j in ord_indx[1:]])
    print()

I love coding in Python language.
['Python is a reptile.', 'Python is a programming language.', 'Python is more readable than Java.', 'python ate a mouse', 'The python ate a mouse.', 'All six continents have a python species.', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.']

Python is more readable than Java.
['Python is a programming language.', 'I love coding in Python language.', 'All six continents have a python species.', 'The python ate a mouse.', 'Python is a reptile.', 'python ate a mouse', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.']

Pythons are famous for their very long body.
['Python is famous for its very long body.', 'Python is a reptile.', 'All six continents have a python species.', 'I love coding in Python language.', 'Python is more readable than Java.', 'python ate a mouse', 'Python is a programming language.', 'The python ate a mouse.']

Python is famous for its very long body.
['Pythons are famous for their very long body.', 'Python is a reptile.', 'All six continents have a python species.', 'I love coding in Python language.', 'python ate a mouse', 'Python is a programming language.', 'Python is more readable than Java.', 'The python ate a mouse.']

All six continents have a python species.
['Pythons are famous for their very long body.', 'Python is a reptile.', 'Python is famous for its very long body.', 'I love coding in Python language.', 'Python is more readable than Java.', 'The python ate a mouse.', 'python ate a mouse', 'Python is a programming language.']

Python is a programming language.
['Python is more readable than Java.', 'I love coding in Python language.', 'Python is a reptile.', 'All six continents have a python species.', 'python ate a mouse', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.', 'The python ate a mouse.']

Python is a reptile.
['Python is famous for its very long body.', 'Pythons are famous for their very long body.', 'python ate a mouse', 'All six continents have a python species.', 'I love coding in Python language.', 'Python is a programming language.', 'The python ate a mouse.', 'Python is more readable than Java.']

The python ate a mouse. 
['python ate a mouse', 'I love coding in Python language.', 'Python is a reptile.', 'All six continents have a python species.', 'Python is more readable than Java.', 'Pythons are famous for their very long body.', 'Python is a programming language.', 'Python is famous for its very long body.']

python ate a mouse 
['The python ate a mouse.', 'Python is a reptile.', 'I love coding in Python language.', 'All six continents have a python species.', 'Python is more readable than Java.', 'Python is a programming language.', 'Pythons are famous for their very long body.', 'Python is famous for its very long body.'] 

Few observations

1. "python ate a mouse" is more closer to "Python is a reptile." than "The python ate a mouse."
For closeness of these sentences to "Python is a reptile" shows "python ate a mouse" at number 3 while "The python ate a mouse" appears at number 7.

2. The model we are using is "uncased" so capitalization does not matter.

3. Sentences about Python language are similar to each other, and sentences about Python reptile are similar to each other.

4. Word "python" or "Python" alone is closest to 'I love coding in Python language.' then to 'Python is a reptile.', see code snippet below.

from scipy.spatial import distance
python_embedding = get_embedding('python')
csm = [1 - distance.cosine(u = python_embedding[0], v = i) for i in string_embeddings]
print([python_strings[j] for j in np.argsort(csm)[::-1]]) 

['I love coding in Python language.', 
'Python is a reptile.', 
'python ate a mouse', 
'The python ate a mouse.', 
'All six continents have a python species.', 
'Python is a programming language.', 
'Python is more readable than Java.', 
'Python is famous for its very long body.', 
'Pythons are famous for their very long body.']

Wednesday, October 7, 2020

Word Embeddings using BERT and testing using Word Analogies, Nearest Words, 1D Spectrum

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics.pairwise import cosine_similarity
from joblib import load, dump
import torch
import transformers as ppb

import warnings
warnings.filterwarnings('ignore')

ppb.__version__ 
'3.0.1'

model_class, tokenizer_class, pretrained_weights = (ppb.BertModel, ppb.BertTokenizer, 'bert-base-uncased') 

Load pretrained model/tokenizer

tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights) 

The above code downloads three files when it runs for the first time:

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…
HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri… 

The third file is the model with size 440MB.

Our first step is to tokenize the sentences -- break them up into word and subwords in the format BERT is comfortable with. 

sentences = ['First do it', 'then do it right', 'then do it better']
sentences_df = pd.DataFrame({"sentences": sentences})
tokenized = sentences_df['sentences'].apply((lambda x: tokenizer.encode(x, add_special_tokens=True))) 

Padding
After tokenization, `tokenized` is a list of sentences -- each sentences is represented as a list of tokens. We want BERT to process our examples all at once (as one batch). It's just faster that way. For that reason, we need to pad all lists to the same size, so we can represent the input as one 2-d array, rather than a list of lists (of different lengths).

max_len = 0
for i in tokenized.values:
    if len(i) > max_len:
        max_len = len(i)

padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized.values]) 

Masking 
If we directly send `padded` to BERT, that would slightly confuse it. We need to create another variable to tell it to ignore (mask) the padding we've added when it's processing its input. That's what attention_mask is:

attention_mask = np.where(padded != 0, 1, 0) 

%%time
input_ids = torch.LongTensor(padded)
attention_mask = torch.tensor(attention_mask)

with torch.no_grad():
    last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask) 
	
features = last_hidden_states[0][:,0,:].numpy() 
	
Let's slice only the part of the output that we need. That is the output corresponding the first token of each sentence. The way BERT does sentence classification, is that it adds a token called `[CLS]` (for classification) at the beginning of every sentence. Last token is representing [SEP]. The output corresponding to that token can be thought of as an embedding for the entire sentence.



We'll save those in the `features` variable, as they'll serve as the features to our logitics regression model.

Testing 
Word Analogies 

def get_embedding(in_list):
    tokenized = [tokenizer.encode(x, add_special_tokens=True) for x in in_list]
    
    max_len = 0
    for i in tokenized:
        if len(i) > max_len:
            max_len = len(i)

    padded = np.array([i + [0]*(max_len-len(i)) for i in tokenized])
    
    attention_mask = np.where(padded != 0, 1, 0)
    
    input_ids = torch.LongTensor(padded)
    attention_mask = torch.tensor(attention_mask)

    with torch.no_grad():
        last_hidden_states = model(input_ids = input_ids, attention_mask = attention_mask)
        
    features = last_hidden_states[0][:,0,:].numpy()
    return features
	
analogies = [['king', 'man', 'queen', 'woman'],
  ['king', 'prince', 'queen', 'princess'],
  ['miami', 'florida', 'dallas', 'texas'],
  ['einstein', 'scientist', 'picasso', 'painter'],
  ['japan', 'sushi', 'germany', 'bratwurst'],
  ['man', 'woman', 'he', 'she'],
  ['man', 'woman', 'uncle', 'aunt'],
  ['man', 'woman', 'brother', 'sister'],
  ['man', 'woman', 'husband', 'wife'],
  ['man', 'woman', 'actor', 'actress'],
  ['man', 'woman', 'father', 'mother'],
  ['heir', 'heiress', 'prince', 'princess'],
  ['nephew', 'niece', 'uncle', 'aunt'],
  ['france', 'paris', 'japan', 'tokyo'],
  ['france', 'paris', 'china', 'beijing'],
  ['february', 'january', 'december', 'november'],
  ['france', 'paris', 'germany', 'berlin'],
  ['week', 'day', 'year', 'month'],
  ['week', 'day', 'hour', 'minute'],
  ['france', 'paris', 'italy', 'rome'],
  ['paris', 'france', 'rome', 'italy'],
  ['france', 'french', 'england', 'english'],
  ['japan', 'japanese', 'china', 'chinese'],
  ['china', 'chinese', 'america', 'american'],
  ['japan', 'japanese', 'italy', 'italian'],
  ['japan', 'japanese', 'australia', 'australian'],
  ['walk', 'walking', 'swim', 'swimming']]

for i in analogies:
    king = get_embedding([i[0]])
    queen = get_embedding([i[2]])
    man = get_embedding([i[1]])
    woman = get_embedding([i[3]])
    q = king - man + woman
    print(i[0], '-', i[1], '+', i[3], 'and', i[2], cosine_similarity(queen, q)) 
	
king - man + woman and queen [[0.95728725]]
king - prince + princess and queen [[0.9805071]]
miami - florida + texas and dallas [[0.93608725]]
einstein - scientist + painter and picasso [[0.9021458]]
japan - sushi + bratwurst and germany [[0.8383053]]
man - woman + she and he [[0.97603536]]
man - woman + aunt and uncle [[0.9624729]]
man - woman + sister and brother [[0.970188]]
man - woman + wife and husband [[0.9585104]]
man - woman + actress and actor [[0.95233154]]
man - woman + mother and father [[0.9783108]]
heir - heiress + princess and prince [[0.9558885]]
nephew - niece + aunt and uncle [[0.9844531]]
france - paris + tokyo and japan [[0.95287836]]
france - paris + beijing and china [[0.94868445]]
february - january + november and december [[0.89765096]]
france - paris + berlin and germany [[0.9586985]]
week - day + month and year [[0.9131064]]
week - day + minute and hour [[0.9280644]]
france - paris + rome and italy [[0.92742187]]
paris - france + italy and rome [[0.9252609]]
france - french + english and england [[0.9143828]]
japan - japanese + chinese and china [[0.9681916]]
china - chinese + american and america [[0.9371264]]
japan - japanese + italian and italy [[0.97318065]]
japan - japanese + australian and australia [[0.96878356]]
walk - walking + swimming and swim [[0.90309924]] 

Nearest Words 
We have retrieved nouns from the 'BERT Base Uncased' vocabulary. There are 15269 nouns in this vocabulary.
You can download "vocab.txt" from here: GitHub
We have used to SpaCy to identify the nouns.

nouns = load('files_5_p3/list_of_nouns_from_bert_base_uncased_vocab.joblib')

%%time
noun_embeddings = [get_embedding([i]) for i in nouns]
dump(noun_embeddings, 'files_2_p2/list_of_noun_embeddings.joblib') 

Wall time: 20min 8s 

noun_embeddings = load('files_2_p2/list_of_noun_embeddings.joblib')
noun_embeddings = [n[0] for n in noun_embeddings]

from scipy.spatial import distance

def get_nn_of_words(in_list):
    for k in in_list:
        input_word = k

        if k not in nouns:
            continue
        p = noun_embeddings[nouns.index(input_word)]

        closest_embedding_indices = distance.cdist(np.array(p).reshape(1, -1), 
                                            np.array(noun_embeddings).reshape(len(noun_embeddings),-1))[0].argsort()[1:11]
        
        closest_nouns = [nouns[i] for i in closest_embedding_indices]
        print("For", k, closest_nouns)
		
get_nn_of_words(set(pd.core.common.flatten(analogies))) 

For germany ['austria', 'bavaria', 'berlin', 'luxembourg', 'europe', 'japan', 'britain', 'wurttemberg', 'dresden', 'sweden']
For niece ['nephew', 'granddaughter', 'fiancee', 'daughter', 'grandparents', 'grandson', 'stepmother', 'aunt', 'cousins', 'wife']

For aunt ['grandmother', 'grandfather', 'uncle', 'cousin', 'sister', 'mother', 'miriam', 'vicki', 'uncles', 'cousins']

For february ['january', 'april', 'june', 'november', 'march', 'july', 'august', 'december', 'october', 'spring']

For england ['britain', 'wales', 'australia', 'ireland', 'barbados', 'stoke', 'brentford', 'lancashire', 'cuba', 'luxembourg']

For america ['planet', 'dakota', 'hawaii', 'britain', 'hemisphere', 'coral', 'virginia', 'nina', 'columbia', 'victoria']

For italian ['italy', 'russian', 'catalan', 'portuguese', 'french', 'azerbaijani', 'indonesian', 'austrian', 'japanese', 'irish']

For uncle ['aunt', 'cousin', 'grandfather', 'brother', 'grandmother', 'uncles', 'doc', 'bobby', 'mother', 'kid']

For miami ['tampa', 'seattle', 'vancouver', 'portland', 'arizona', 'vegas', 'sydney', 'florida', 'houston', 'orlando']

For italy ['austria', 'germany', 'luxembourg', 'europe', 'rico', 'japan', 'africa', 'indonesia', 'florence', 'tuscany']

For woman ['teenager', 'girl', 'spouse', 'brother', 'partner', 'daughter', 'mother', 'consort', 'wife', 'stallion']

For english ['afrikaans', 'hindi', 'latin', 'portuguese', 'sanskrit', 'french', 'italian', 'hebrew', 'azerbaijani', 'lithuanian']

For king ['duke', 'prince', 'queen', 'princess', 'throne', 'consort', 'deity', 'queens', 'abbot', 'lords']

For dallas ['jasmine', 'travis', 'savannah', 'eden', 'lucas', 'mia', 'lexi', 'jack', 'hunter', 'penny']

For mother ['grandmother', 'brother', 'mothers', 'parents', 'daughter', 'mom', 'father', 'grandfather', 'sister', 'mary']

For heiress ['landowner', 'heir', 'daughters', 'heirs', 'daughter', 'granddaughter', 'siblings', 'grandson', 'childless', 'clerk']

For japanese ['korean', 'japan', 'thai', 'russian', 'hawaiian', 'malaysian', 'indonesian', 'khmer', 'taiwanese', 'bengali']

For heir ['heirs', 'consort', 'spouse', 'prince', 'womb', 'attendants', 'fulfillment', 'duke', 'daughter', 'keeper']

For january ['november', 'april', 'august', 'february', 'december', 'summer', 'july', 'spring', 'october', 'june']

For brother ['sister', 'grandfather', 'cousin', 'grandmother', 'mother', 'daughter', 'partner', 'bowl', 'mentor', 'beau']

For wife ['husbands', 'daughter', 'spouse', 'husband', 'woman', 'girlfriend', 'household', 'supporter', 'boyfriend', 'granddaughter']

For minute ['moments', 'hour', 'dozen', 'mile', 'cycles', 'millennia', 'moment', 'sizes', 'clocks', 'twenties']

For picasso ['goldsmith', 'michelangelo', 'fresco', 'carousel', 'chopin', 'verdi', 'hercules', 'palette', 'canvas', 'britten']

For week ['month', 'series', 'replacement', 'primetime', 'position', 'highlight', 'zone', 'slot', 'office', 'showcase']

For japan ['america', 'ceylon', 'hawaii', 'malaysia', 'australia', 'taiwan', 'osaka', 'fukuoka', 'indonesia', 'korea']

For einstein ['aristotle', 'nobel', 'beckett', 'wiener', 'relativity', 'abel', 'strauss', 'skinner', 'clifford', 'bernstein']

For australian ['australia', 'canadian', 'canada', 'fremantle', 'oceania', 'america', 'brazil', 'nepal', 'jakarta', 'hawaii']

For painter ['musician', 'painting', 'paintings', 'designer', 'dancer', 'filmmaker', 'illustrator', 'teacher', 'soldier', 'boxer']

For man ['lump', 'woman', 'boss', 'bear', 'scratch', 'intruder', 'alpha', 'rat', 'touch', 'condo']

For florida ['maine', 'louisiana', 'arizona', 'virginia', 'charleston', 'indiana', 'tampa', 'colorado', 'alabama', 'connecticut']

For year ['season', 'month', 'eligibility', 'seasons', 'name', 'calendar', 'date', 'colour', 'highlight', 'divisional']

For tokyo ['osaka', 'kyoto', 'fukuoka', 'nagoya', 'seoul', 'kobe', 'moscow', 'honolulu', 'japan', 'nippon']

For november ['october', 'january', 'december', 'winter', 'spring', 'august', 'april', 'monday', 'halloween', 'wednesday']

For rome ['titan', 'vulcan', 'mesopotamia', 'damascus', 'alexandria', 'egypt', 'baghdad', 'orion', 'denver', 'nevada']

For china ['taiwan', 'fujian', 'indonesia', 'japan', 'asia', 'sichuan', 'malawi', 'lebanon', 'russia', 'zimbabwe']

For hour ['minute', 'hours', 'moments', 'dozen', 'weeks', 'inning', 'day', 'cycles', 'midnight', 'minutes']

For texas ['oregon', 'alabama', 'florida', 'colorado', 'ohio', 'indiana', 'georgia', 'houston', 'arkansas', 'arizona']

For sister ['brother', 'daughter', 'mother', 'grandmother', 'grandfather', 'cousin', 'aunt', 'padre', 'sisters', 'blossom']

For berlin ['vienna', 'stuttgart', 'hannover', 'hamburg', 'bonn', 'dresden', 'dusseldorf', 'gottingen', 'mannheim', 'rosenthal']

For actress ['actor', 'musician', 'singer', 'novelist', 'teacher', 'dancer', 'magician', 'poet', 'painter', 'actors']

For beijing ['tianjin', 'guangzhou', 'singapore', 'honolulu', 'taipei', 'ankara', 'osaka', 'manila', 'durban', 'jakarta']

For princess ['prince', 'madam', 'papa', 'kira', 'sweetie', 'witch', 'ruby', 'wedding', 'tasha', 'marta']

For nephew ['niece', 'grandson', 'granddaughter', 'daughter', 'fiancee', 'girlfriend', 'son', 'brother', 'sidekick', 'wife']

For month ['week', 'summers', 'evening', 'calendar', 'decade', 'semester', 'term', 'position', 'seasonal', 'occasion']

For swimming ['diving', 'weightlifting', 'judo', 'badminton', 'tennis', 'gymnastics', 'archery', 'swimmers', 'breaststroke', 'hockey']

For queen ['princess', 'queens', 'maid', 'king', 'prince', 'duke', 'consort', 'crown', 'stallion', 'madam']

For actor ['actress', 'actors', 'poet', 'television', 'singer', 'novelist', 'comedian', 'musician', 'screenwriter', 'painter']

For december ['november', 'january', 'october', 'april', 'march', 'june', 'july', 'september', 'august', 'autumn']

For american ['british', 'americans', 'america', 'britain', 'african', 'haitian', 'kenyan', 'bangladeshi', 'resident', 'canadian']

For french ['italian', 'portuguese', 'dutch', 'english', 'spanish', 'afrikaans', 'filipino', 'romanian', 'france', 'greek']

For prince ['princess', 'duke', 'consort', 'king', 'benedict', 'commander', 'papa', 'dean', 'throne', 'kevin']

For scientist ['physician', 'archaeologist', 'golfer', 'inventor', 'chef', 'consultant', 'investigator', 'teenager', 'astronaut', 'technician']

For paris ['bonn', 'laval', 'provence', 'dublin', 'geneva', 'eugene', 'michel', 'koln', 'benoit', 'ville']

For father ['mother', 'fathers', 'brother', 'daddy', 'uncles', 'son', 'sister', 'homeland', 'dad', 'protector']

For husband ['wife', 'spouse', 'lover', 'boyfriend', 'husbands', 'woman', 'daughter', 'son', 'fiance', 'mother']

For france ['martinique', 'luxembourg', 'marseille', 'geneva', 'bordeaux', 'lyon', 'paris', 'clermont', 'alsace', 'switzerland']

For australia ['australian', 'america', 'canada', 'tasmania', 'sydney', 'britain', 'japan', 'fremantle', 'malaysia', 'hawaii']

For day ['evening', 'nightfall', 'midnight', 'night', 'dawn', 'morning', 'moments', 'afternoon', 'epoch', 'sunrise'] 

1D Spectrum 

for i in analogies:
    king = get_embedding([i[0]])
    queen = get_embedding([i[2]])
    man = get_embedding([i[1]])
    woman = get_embedding([i[3]])
    q = king - man + woman
    
    print(i[0], i[1], i[2], i[3], cosine_similarity(queen, q))
    
    for j in i:
        print(j, ":")
        np.random.seed(1)
        plt.rcParams["figure.figsize"] = 8, 2

        x = np.linspace(0, 768, num=768)
        y = get_embedding([j])

        fig, (ax,ax2) = plt.subplots(nrows=2, sharex=True)

        extent = [x[0]-(x[1]-x[0])/2., x[-1]+(x[1]-x[0])/2.,0,1]
        ax.imshow(y, cmap="plasma", aspect="auto", extent=extent)
        ax.set_yticks([])
        ax.set_xlim(extent[0], extent[1])

        ax2.plot(x, y.ravel())

        plt.tight_layout()
        plt.show()