botiverse.preprocessors.Frequency package#
Submodules#
botiverse.preprocessors.Frequency.Frequency module#
- class botiverse.preprocessors.Frequency.Frequency.Frequency(sample_rate=16000, duration=1, augment=None, type='spec', nmels=70, n_fft=720, hop_length=360, is_log=True, **kwargs)[source]#
Bases:
objectAn interface for transforming audio files into frequency domain representations.
Initialize the frequency transformer.
- Parameters:
sample_rate (int) – The sample rate of the audio files.
duration (int) – The duration of the audio files in seconds.
augment (audiomentations.Compose) – The audio augmentations to apply to the audio files.
type (str) – The type of frequency domain representation to use. Can be ‘spec’ for spectrogram or ‘mfcc’ for Mel-frequency cepstral coefficients.
nmels (int) – The number of mel bins to use for the Mel-frequency cepstral coefficients.
n_fft (int) – The number of samples to use for each frame of the spectrogram.
hop_length (int) – The number of samples to shift the window by between frames of the spectrogram.
is_log (bool) – Whether to use a log scale for the spectrogram.
kwargs – Keyword arguments to be passed to the frequency domain transformer.
- transform_list(words, n=4)[source]#
Given a folder dataset with folders each containing audio files, this returns a table of spectra in the form of a numpy array X and a table of classes in the form of a numpy array y.
- Parameters:
words (list) – A list of words which are the classes of the speech classifier.
n (int) – The number of times to augment each audio file.
- Returns:
A tuple of the form (X, y) where X is a 3D numpy array representing the audio files and y is a 1D numpy array representing the classes of the audio files.
- Return type:
tuple of numpy.ndarray
- transform(path, strict_duration=False)[source]#
Convert the audio file given in path into a frequency domain representation.
- Parameters:
path (str) – The path to the audio file.
strict_duration (bool) – Whether to strictly use the duration specified during init or not. If True, then the audio file is padded with zeros if it is shorter than the duration and truncated if it is longer than the duration.
- Returns:
The frequency domain representation of the audio file as a 2D numpy array.
- Return type:
numpy.ndarray