botiverse.bots.VoiceBot package#

Submodules#

botiverse.bots.VoiceBot.SpeechClassifier module#

class botiverse.bots.VoiceBot.SpeechClassifier.SpeechClassifier(words, samplerate, duration, repr='wav2vec', machine='lstm', **kwargs)[source]#

Bases: object

An interface for the speech classifier chatbot which classifies speech into one of a set of classes. Suitable when the number of classes is small and the words are easily pronounceable.

Initialize the dataset and its transformation for the speech classification process.

Parameters:
  • words (list) – A list of words which are the classes of the speech classifier.

  • samplerate (int) – The sample rate of the audio files.

  • duration (int) – The duration of the audio files in milliseconds.

  • repr (str or object) – The representation to use for the audio files. Can be ‘wav2vec’, ‘mfcc’, ‘spectrogram’ or a custom representation

  • machine (str or object) – The machine learning model to use for classification. Can be ‘lstm’ or a custom model.

generate_read_data(n=3, regenerate=False, force_download_noise=False, **kwargs)[source]#

Generate synthetic audio data for the words specified during init and then corrupt it with noise and audio transformations.

Parameters:
  • n (int) – The number of audio files to generate for each word using audio transformations.

  • regenerate (bool) – Whether to regenerate the dataset even if it already exists.

  • force_download_noise (bool) – Whether to force download the noise dataset even if it already exists.

  • kwargs – Keyword arguments to be passed to the transformer (that puts audio in the chosen representation).

Returns:

A tuple of the form (X, y) where X is a 3D numpy array representing the audio files and y is a 1D numpy array representing the classes of the audio files.

Return type:

tuple of numpy.ndarray

fit(X, y, λ=0.001, α=0.01, hidden=128, patience=50, max_epochs=600, **kwargs)[source]#

Train the speech classifier model.

Parameters:
  • X (numpy.ndarray) – A 3D numpy array representing the audio files.

  • y (numpy.ndarray) – A 1D numpy array representing the classes of the audio files.

  • λ (float) – The learning rate parameter.

  • α (float) – The regularization parameter.

  • hidden (int) – The number of hidden units in the LSTM layer.

  • patience (int) – The number of bad epochs to wait before early stopping.

  • max_epochs (int) – The maximum number of epochs to train for.

  • kwargs – Keyword arguments to be passed to the model’s fit method.

save(path)[source]#

Save the model to a file.

Parameters:

path – The path to the file

load(path, **kwargs)[source]#

Load the model from a file.

Parameters:
  • path – The path to the file

  • kwargs – Keyword arguments to be passed to the model’s load method.

predict(path, index=False)[source]#

Predict the class of the audio file at the given path.

Parameters:
  • path (str) – The path to the audio file to be classified.

  • index (bool) – Whether to return the index of the class or the class itself.

Returns:

The class of the audio file at the given path.

Return type:

str or int

botiverse.bots.VoiceBot.VoiceBot module#

class botiverse.bots.VoiceBot.VoiceBot.VoiceBot(call_json_path, repr='BERT-Sentence')[source]#

Bases: object

An interface for the vocalizer chatbot which simulates a call with a customer service bot.

Load the call data from a json file that contains the call’s state machine.

Parameters:
  • call_json_path (str) – The path to the json file containing the call state machine.

  • repr (str) – The numerical representation to use for the audio files. Can be ‘BERT’ or ‘BERT-Sentence’.

simulate_call()[source]#

Simulate a call with a voice bot as driven by the call state machine.

botiverse.bots.VoiceBot.utils module#

botiverse.bots.VoiceBot.utils.voice_input(record_time=3, voice_threshold=900, save_path='sample.wav')[source]#

Upon call, record audio for record_time seconds and save it to save_path while only inputting audio that is above the voice_threshold.

Parameters:
  • record_time (int) – The number of seconds to record for.

  • voice_threshold (int) – The minimum volume of audio to record.

  • save_path (str) – The path to save the audio file to.

Returns:

The path to the audio file.

Return type:

str

Module contents#