botiverse.preprocessors.Special.ConverseBot_Preprocessor package#
Submodules#
botiverse.preprocessors.Special.ConverseBot_Preprocessor.ConverseBot_Preprocessor module#
- class botiverse.preprocessors.Special.ConverseBot_Preprocessor.ConverseBot_Preprocessor.ConverseBot_Preprocessor(file_path=None, dataset=None)[source]#
Bases:
object‘An interface that provides the required preprocessing for the ConverseBot bot
Initializes a ConverseBot_Preprocessor instance with an optional training dataset, note that the dataset structure is an array of multiturn conversations and each multiturn conversation is an array of strings, e.g., [[“hi”,”hello”,”how are you?”], [“good”,”how about you?”,”i am fine”]]
- Parameters:
dataset (list of list of str, optional) – Dataset to be processed (use it or file_path).
file_path (str, optional) – Path to the .json file that contains the conversation array (use it or dataset).
- Returns:
None
- process()[source]#
Processes the conversations dataset by cleaning it then combining each conversation into a single string (with [C] between each turn) and then tokenizing it.
- Returns:
DataFrame containing the processed conversations.
- Return type:
DataFrame
- clean_string(string)[source]#
Cleans a string by removing certain spaces and new line characters.
- Parameters:
string (str) – The string to clean.
- Returns:
The cleaned string.
- Return type:
str
- tokenize_string(string, target=False)[source]#
Tokenizes a string.
- Parameters:
string (str) – The string to tokenize.
target (bool, optional) – Indicates whether the string is a target.
- Returns:
Tokenized string.
- Return type:
Dict[str, Tensor]