survival8: Performance testing of BERT based Sentence Transformers for sentence encoding

Monday, September 19, 2022

Performance testing of BERT based Sentence Transformers for sentence encoding

Download Code


import pandas as pd
from sentence_transformers import SentenceTransformer

sbert_model = SentenceTransformer('bert-base-nli-mean-tokens')

df = pd.read_csv('nytEditorialSnippets_GroundTruth.txt', sep = '\t')

df_10 = df[0:10]
df_100 = df[0:100]

# 1. Using Sentence Encoder in a function on every record

def get_embedding(input_sentence):
    return sbert_model.encode([input_sentence])[0]

%%timeit
df_out_1 = df_100['text'].apply(get_embedding)



9.15 s ± 317 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



%%time
df_out_1 = df_100['text'].apply(get_embedding)



CPU times: user 18.3 s, sys: 112 ms, total: 18.5 s
Wall time: 9.25 s



%%time
df_out_1 = df_10['text'].apply(get_embedding)



CPU times: user 1.97 s, sys: 7.88 ms, total: 1.98 s
Wall time: 997 ms



df_out_1



0     [-0.7970602, 0.47616163, 0.2621567, 0.38846374...
1     [-0.32450542, -0.10945253, 0.6443658, 0.212320...
2     [-0.2602994, -0.0036350375, 1.2917686, 0.12602...
3     [0.5173101, -0.86385506, 1.5003084, 0.76273316...
4     [-0.19630705, 1.611963, 0.8502133, 0.059544455...
                            ...                        
95    [-0.7843676, 0.70446295, -0.86373883, 0.096476...
96    [-0.047543377, -1.0461698, 0.9984542, 0.776394...
97    [-0.5863306, 0.38590172, -0.15509816, 0.275745...
98    [0.22616625, 0.33843663, 0.030288033, 0.191214...
99    [0.40268317, 1.1528935, 0.3597172, 0.16918863,...
Name: text, Length: 100, dtype: object



type(df_out_1)



pandas.core.series.Series



# 2. Using Sentence Encoder on entire array of sentences at once

%%timeit
df_out_2 = sbert_model.encode(df_100['text'].values)



9.25 s ± 275 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



%%time
df_out_2 = sbert_model.encode(df_100['text'].values)



CPU times: user 17.8 s, sys: 1 s, total: 18.8 s
Wall time: 9.41 s



df_out_2



array([[-0.7970604 ,  0.4761617 ,  0.262157  , ..., -0.2437577 ,
1.1380528 ,  0.28697717],
[-0.32450542, -0.10945235,  0.64436567, ...,  0.14436643,
-0.24656864, -0.18447737],
[-0.26029944, -0.00363465,  1.2917686 , ...,  1.1463983 ,
-1.0714562 , -0.09548129],
...,
[-0.5863306 ,  0.38590172, -0.15509816, ..., -0.2937488 ,
-0.3724223 ,  0.1826524 ],
[ 0.22616649,  0.33843663,  0.03028765, ...,  0.66338176,
-0.6620043 ,  0.09410357],
[ 0.40268335,  1.152894  ,  0.35971704, ..., -0.9203086 ,
0.17893644,  0.71039814]], dtype=float32)



type(df_out_2)


numpy.ndarray

df_out_2.shape

(100, 768)

survival8

Pages

Monday, September 19, 2022

Performance testing of BERT based Sentence Transformers for sentence encoding

No comments:

Post a Comment