botiverse.preprocessors.Wav2Vec package#
Submodules#
botiverse.preprocessors.Wav2Vec.Wav2Vec module#
- class botiverse.preprocessors.Wav2Vec.Wav2Vec.Wav2Vec(sample_rate=16000, duration=1, augment=None)[source]#
Bases:
objectAn interface for transforming audio files into wav2vec vectors.
Initialize the Wav2Vec transformer by loading the wav2vec model and setting the sample rate and duration of the audio files.
- Parameters:
sample_rate (int) – The sample rate of the audio files
duration (int) – The duration of the audio files in milliseconds
augment (audiomentations.Compose) – The audio augmentations to apply to the audio files.
- transform_list(words, n=4)[source]#
Given a folder dataset with folders each containing audio files, this returns a table of wav2vec vectors (one for each audio file) in the form of a numpy array X and a table of classes in the form of a numpy array y. Note that in the process, each audio file is augmented n times and each corresponds to another wav2vec vector.
- Parameters:
words (list) – A list of words which are the classes of the speech classifier.
n (int) – The number of times to augment each audio file.
- Returns:
A tuple of the form (X, y) where X is a 3D numpy array representing the wav2vec vectors and y is a 1D numpy array representing the classes of the audio files.
- transform(path, strict_duration=False)[source]#
Convert the audio file as in the path into a wav2vec vector.
- Parameters:
path (str) – The path to the audio file
strict_duration (bool) – If True, the audio file is padded or truncated to the duration specified during init.
- Returns:
The wav2vec vector of the audio file as a 2D numpy array.
- Return type:
numpy.ndarray