botiverse.preprocessors.BertEmbeddings package#

Submodules#

botiverse.preprocessors.BertEmbeddings.BertEmbeddings module#

class botiverse.preprocessors.BertEmbeddings.BertEmbeddings.BertEmbedder[source]#

Bases: object

An interface for converting given text into BERT embeddings.

Load the pre-trained model and tokenizer

embed(sentences, random_state=42)[source]#

Convert the given sentences into BERT embeddings.

Parameters:
  • sentences (list) – A list of sentences to convert into BERT embeddings.

  • random_state (int) – The random state to use for reproducibility.

Returns:

A list of BERT embeddings for the given sentences.

Return type:

list

closest_sentence(new_sentence, sentence_list, retun_ind=False)[source]#

Given a list of sentences and a new sentence, return the sentence from the list that is closest to the new sentence.

Parameters:
  • new_sentence (str) – The new sentence to compare to the list of sentences.

  • sentence_list (list) – A list of sentences to compare the new sentence to.

  • retun_ind (bool) – Whether to return the index of the closest sentence instead of the sentence itself.

Returns:

The sentence from the list that is closest to the new sentence and its score.

Return type:

str, float

class botiverse.preprocessors.BertEmbeddings.BertEmbeddings.BertSentenceEmbedder[source]#

Bases: object

An interface for converting given text into sentence BERT embeddings.

Load the pre-trained model and tokenizer

embed(sentences)[source]#

Convert the given sentences into BERT embeddings.

Parameters:

sentences (list) – A list of sentences to convert into BERT embeddings.

Returns:

A list of BERT embeddings for the given sentences.

Return type:

list

closest_sentence(new_sentence, sentence_list, retun_ind=False)[source]#

Given a list of sentences and a new sentence, return the sentence from the list that is closest to the new sentence.

Parameters:
  • new_sentence (str) – The new sentence to compare to the list of sentences.

  • sentence_list (list) – A list of sentences to compare the new sentence to.

  • retun_ind (bool) – Whether to return the index of the closest sentence instead of the sentence itself.

Module contents#