botiverse.preprocessors.Special.WhizBot_BERT_Preprocessor package#

Submodules#

class botiverse.preprocessors.Special.WhizBot_BERT_Preprocessor.WhizBot_BERT_Preprocessor.WhizBot_BERT_Preprocessor(file_path)[source]#

Bases: object

An interface that provides the required preprocessing for the WhizBot_BERT bot

Initializes a WhizBot_BERT_Preprocessor instance with the file path of the dataset and the BERT model parameters.

Applies preprocessing steps to the loaded data.

clean_string(string)[source]#

Cleans the given text string by removing the emojies.

tokenize_string(string)[source]#

Tokenizes a given text string using the BERT tokenizer.

Parameters:: string (str) – The string to tokenize.
Returns:: A dictionary containing the tokenized version of the input text strin i.e., the ids and attention_masks.
Return type:: dict

embed_tokens(tokens_obj)[source]#

Precomputes embeddings for the tokenized text string.

Parameters:: tokens_obj (dict) – A dictionary containing the tokenized version of a text string.
Returns:: Tensor representing the embeddings of the tokenized text string.
Return type:: Tensor

process_string(string)[source]#

Applies the whole preprocessing pipeline to a given text string.

Parameters:: string (str) – The string to process
Returns:: Tensor representing the embeddings of the processed and tokenized text string.
Return type:: Tensor