botiverse.preprocessors.Special.WhizBot_BERT_Preprocessor package#

Submodules#

botiverse.preprocessors.Special.WhizBot_BERT_Preprocessor.WhizBot_BERT_Preprocessor module#

class botiverse.preprocessors.Special.WhizBot_BERT_Preprocessor.WhizBot_BERT_Preprocessor.WhizBot_BERT_Preprocessor(file_path)[source]#

Bases: object

An interface that provides the required preprocessing for the WhizBot_BERT bot

Initializes a WhizBot_BERT_Preprocessor instance with the file path of the dataset and the BERT model parameters.

Parameters:

file_path (str) – Path to the .json file to be read.

Returns:

None

process()[source]#

Applies preprocessing steps to the loaded data.

Returns:

Processed data.

Return type:

DataFrame

clean_string(string)[source]#

Cleans the given text string by removing the emojies.

Parameters:

string (str) – The string to process.

Returns:

The processed string.

Return type:

str

tokenize_string(string)[source]#

Tokenizes a given text string using the BERT tokenizer.

Parameters:

string (str) – The string to tokenize.

Returns:

A dictionary containing the tokenized version of the input text strin i.e., the ids and attention_masks.

Return type:

dict

embed_tokens(tokens_obj)[source]#

Precomputes embeddings for the tokenized text string.

Parameters:

tokens_obj (dict) – A dictionary containing the tokenized version of a text string.

Returns:

Tensor representing the embeddings of the tokenized text string.

Return type:

Tensor

process_string(string)[source]#

Applies the whole preprocessing pipeline to a given text string.

Parameters:

string (str) – The string to process

Returns:

Tensor representing the embeddings of the processed and tokenized text string.

Return type:

Tensor

Module contents#