botiverse.preprocessors.BertEmbeddings package#
Submodules#
botiverse.preprocessors.BertEmbeddings.BertEmbeddings module#
- class botiverse.preprocessors.BertEmbeddings.BertEmbeddings.BertEmbedder[source]#
Bases:
objectAn interface for converting given text into BERT embeddings.
Load the pre-trained model and tokenizer
- embed(sentences, random_state=42)[source]#
Convert the given sentences into BERT embeddings.
- Parameters:
sentences (list) – A list of sentences to convert into BERT embeddings.
random_state (int) – The random state to use for reproducibility.
- Returns:
A list of BERT embeddings for the given sentences.
- Return type:
list
- closest_sentence(new_sentence, sentence_list, retun_ind=False)[source]#
Given a list of sentences and a new sentence, return the sentence from the list that is closest to the new sentence.
- Parameters:
new_sentence (str) – The new sentence to compare to the list of sentences.
sentence_list (list) – A list of sentences to compare the new sentence to.
retun_ind (bool) – Whether to return the index of the closest sentence instead of the sentence itself.
- Returns:
The sentence from the list that is closest to the new sentence and its score.
- Return type:
str, float
- class botiverse.preprocessors.BertEmbeddings.BertEmbeddings.BertSentenceEmbedder[source]#
Bases:
objectAn interface for converting given text into sentence BERT embeddings.
Load the pre-trained model and tokenizer
- embed(sentences)[source]#
Convert the given sentences into BERT embeddings.
- Parameters:
sentences (list) – A list of sentences to convert into BERT embeddings.
- Returns:
A list of BERT embeddings for the given sentences.
- Return type:
list
- closest_sentence(new_sentence, sentence_list, retun_ind=False)[source]#
Given a list of sentences and a new sentence, return the sentence from the list that is closest to the new sentence.
- Parameters:
new_sentence (str) – The new sentence to compare to the list of sentences.
sentence_list (list) – A list of sentences to compare the new sentence to.
retun_ind (bool) – Whether to return the index of the closest sentence instead of the sentence itself.