botiverse.preprocessors.Wav2Vec package#

Submodules#

botiverse.preprocessors.Wav2Vec.Wav2Vec module#

class botiverse.preprocessors.Wav2Vec.Wav2Vec.Wav2Vec(sample_rate=16000, duration=1, augment=None)[source]#

Bases: object

An interface for transforming audio files into wav2vec vectors.

Initialize the Wav2Vec transformer by loading the wav2vec model and setting the sample rate and duration of the audio files.

Parameters:
  • sample_rate (int) – The sample rate of the audio files

  • duration (int) – The duration of the audio files in milliseconds

  • augment (audiomentations.Compose) – The audio augmentations to apply to the audio files.

transform_list(words, n=4)[source]#

Given a folder dataset with folders each containing audio files, this returns a table of wav2vec vectors (one for each audio file) in the form of a numpy array X and a table of classes in the form of a numpy array y. Note that in the process, each audio file is augmented n times and each corresponds to another wav2vec vector.

Parameters:
  • words (list) – A list of words which are the classes of the speech classifier.

  • n (int) – The number of times to augment each audio file.

Returns:

A tuple of the form (X, y) where X is a 3D numpy array representing the wav2vec vectors and y is a 1D numpy array representing the classes of the audio files.

transform(path, strict_duration=False)[source]#

Convert the audio file as in the path into a wav2vec vector.

Parameters:
  • path (str) – The path to the audio file

  • strict_duration (bool) – If True, the audio file is padded or truncated to the duration specified during init.

Returns:

The wav2vec vector of the audio file as a 2D numpy array.

Return type:

numpy.ndarray

class botiverse.preprocessors.Wav2Vec.Wav2Vec.Wav2Text[source]#

Bases: object

An interface for converting speech files into text using wav2vec2.

Load the pre-trained model and tokenizer

transcribe(path)[source]#

Given a path to a speech file, return the transcription of the speech file.

Parameters:

path (str) – The path to the speech wav file

Returns:

The transcription of the speech file

Return type:

str

Module contents#