botiverse.preprocessors.Special.WhizBot_GRU_Preprocessor package#

Submodules#

botiverse.preprocessors.Special.WhizBot_GRU_Preprocessor.WhizBot_GRU_Preprocessor module#

class botiverse.preprocessors.Special.WhizBot_GRU_Preprocessor.WhizBot_GRU_Preprocessor.WhizBot_GRU_Preprocessor(file_path)[source]#

Bases: object

An interface that provides the required preprocessing for the WhizBot_GRU bot

Constructs a WhizBot_GRU_Preprocessor instance with the file path of the dataset.

Parameters:

file_path (str) – Path to the .json file if the dataset.

Returns:

None

process()[source]#

loads the data, cleans it, tokenizes it, pads it and removes outlier sequances.

Returns:

Processed data.

Return type:

DataFrame

clean_string(string)[source]#

Cleans the given text string by removing punctuation, converting to lowercase and removing non-ascii characters.

Parameters:

string (str) – Provided string.

Returns:

Cleaned string.

Return type:

str

tokenize_string(string)[source]#

Tokenizes the given text string.

Parameters:

string (str) – Provided string.

Returns:

Tokenizens Id.

Return type:

Tensor

process_string(string)[source]#

Cleans and tokenizes a given text string, the pads it to the longest sequence length (for batch processing).

Parameters:

string (str) – Provided string.

Returns:

Processed padded tokens ids.

Return type:

Tensor

pad_sequence(sequence)[source]#

Pads a given sequence to make it compatible with batch processing.

Parameters:

sequence (Tensor) – Provided sequence.

Returns:

Padded sequence.

Return type:

Tensor

Module contents#