botiverse.bots.VoiceBot package#
Submodules#
botiverse.bots.VoiceBot.SpeechClassifier module#
- class botiverse.bots.VoiceBot.SpeechClassifier.SpeechClassifier(words, samplerate, duration, repr='wav2vec', machine='lstm', **kwargs)[source]#
Bases:
objectAn interface for the speech classifier chatbot which classifies speech into one of a set of classes. Suitable when the number of classes is small and the words are easily pronounceable.
Initialize the dataset and its transformation for the speech classification process.
- Parameters:
words (list) – A list of words which are the classes of the speech classifier.
samplerate (int) – The sample rate of the audio files.
duration (int) – The duration of the audio files in milliseconds.
repr (str or object) – The representation to use for the audio files. Can be ‘wav2vec’, ‘mfcc’, ‘spectrogram’ or a custom representation
machine (str or object) – The machine learning model to use for classification. Can be ‘lstm’ or a custom model.
- generate_read_data(n=3, regenerate=False, force_download_noise=False, **kwargs)[source]#
Generate synthetic audio data for the words specified during init and then corrupt it with noise and audio transformations.
- Parameters:
n (int) – The number of audio files to generate for each word using audio transformations.
regenerate (bool) – Whether to regenerate the dataset even if it already exists.
force_download_noise (bool) – Whether to force download the noise dataset even if it already exists.
kwargs – Keyword arguments to be passed to the transformer (that puts audio in the chosen representation).
- Returns:
A tuple of the form (X, y) where X is a 3D numpy array representing the audio files and y is a 1D numpy array representing the classes of the audio files.
- Return type:
tuple of numpy.ndarray
- fit(X, y, λ=0.001, α=0.01, hidden=128, patience=50, max_epochs=600, **kwargs)[source]#
Train the speech classifier model.
- Parameters:
X (numpy.ndarray) – A 3D numpy array representing the audio files.
y (numpy.ndarray) – A 1D numpy array representing the classes of the audio files.
λ (float) – The learning rate parameter.
α (float) – The regularization parameter.
hidden (int) – The number of hidden units in the LSTM layer.
patience (int) – The number of bad epochs to wait before early stopping.
max_epochs (int) – The maximum number of epochs to train for.
kwargs – Keyword arguments to be passed to the model’s fit method.
- load(path, **kwargs)[source]#
Load the model from a file.
- Parameters:
path – The path to the file
kwargs – Keyword arguments to be passed to the model’s load method.
- predict(path, index=False)[source]#
Predict the class of the audio file at the given path.
- Parameters:
path (str) – The path to the audio file to be classified.
index (bool) – Whether to return the index of the class or the class itself.
- Returns:
The class of the audio file at the given path.
- Return type:
str or int
botiverse.bots.VoiceBot.VoiceBot module#
- class botiverse.bots.VoiceBot.VoiceBot.VoiceBot(call_json_path, repr='BERT-Sentence')[source]#
Bases:
objectAn interface for the vocalizer chatbot which simulates a call with a customer service bot.
Load the call data from a json file that contains the call’s state machine.
- Parameters:
call_json_path (str) – The path to the json file containing the call state machine.
repr (str) – The numerical representation to use for the audio files. Can be ‘BERT’ or ‘BERT-Sentence’.
botiverse.bots.VoiceBot.utils module#
- botiverse.bots.VoiceBot.utils.voice_input(record_time=3, voice_threshold=900, save_path='sample.wav')[source]#
Upon call, record audio for record_time seconds and save it to save_path while only inputting audio that is above the voice_threshold.
- Parameters:
record_time (int) – The number of seconds to record for.
voice_threshold (int) – The minimum volume of audio to record.
save_path (str) – The path to save the audio file to.
- Returns:
The path to the audio file.
- Return type:
str