botiverse.models.T5Model package#
Submodules#
botiverse.models.T5Model.T5Model module#
- class botiverse.models.T5Model.T5Model.AttentionModule(is_decoder=False, num_positional_encoding_buckets=32, positional_encoding_max_distance=128, d_model=768, num_heads=12, dropout_rate=0.1, has_positional_encoding=False)[source]#
Bases:
ModuleA class used for implementing the general transformer attention mechanism.
Constructs an AttentionModule instance with specific hyperparameters.
- Parameters:
is_decoder (bool, optional) – Indicates if we are using a decoder.
num_positional_encoding_buckets (int, optional) – Number of positional encoding buckets.
positional_encoding_max_distance (int, optional) – Max distance for positional encoding.
d_model (int, optional) – Indicates the model embeddings dimension.
num_heads (int, optional) – States the number of attention heads.
dropout_rate (float, optional) – Dropout rate.
has_positional_encoding (bool, optional) – If positional encoding is applied.
- Returns:
None
- relative_positional_encoding(relative_position, bidirectional=True, num_buckets=32, max_distance=128)[source]#
Provides the buckets given the relative positions.
- Parameters:
relative_position (Tensor) – Tensor of relative positions.
bidirectional (bool, optional) – If the attention is bidirectional, is false in the decoder as the token can attend only to the tokens behid it.
num_buckets (int, optional) – Number of buckets for positional encoding.
max_distance (int, optional) – Maximum distance for positional encoding.
- Returns:
Relative buckets.
- Return type:
Tensor
- compute_bias(query_length, key_length)[source]#
Computes the the relative positional bias between the queries and the keys.
- Parameters:
query_length (int) – Length of the query sequance.
key_length (int) – Length of the key sequance.
- Returns:
Positional embeddings.
- Return type:
Tensor
- forward(hidden_states, mask=None, key_value_states=None, position_bias=None)[source]#
The forward pass of the attention layer.
- Parameters:
hidden_states (Tensor) – Tensor of the Query.
mask (Tensor, optional) – Mask to be applied on values.
key_value_states (Tensor, optional) – Tensor of the Key and Value, the default is the same as hidden_states.
position_bias (Tensor, optional) – Positional bias to be added.
- Returns:
Returns the attention output and positional bias.
- Return type:
Tuple[Tensor, Tensor]
- training: bool#
- class botiverse.models.T5Model.T5Model.NewGELUActivation(*args, **kwargs)[source]#
Bases:
ModuleSimple interface of the Gaussian Error Linear Units (GELU) activation function
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class botiverse.models.T5Model.T5Model.DenseGatedActDenseModule(d_model=768, d_ff=2048, dropout_rate=0.1)[source]#
Bases:
ModuleA class used to implement a dense, gated activation function.
Initializes the DenseGatedActDense Module class with the given parameters which is a gated dense layer followed a dense layer.
- Parameters:
d_model (int, optional) – Input dimension to the module (and also the model embedding dimension).
d_ff (int, optional) – Hidden layer dimension.
dropout_rate (float, optional) – Dropout rate.
- Returns:
None
- forward(hidden_states)[source]#
Performs the forward pass of the dense, gated activation function.
- Parameters:
hidden_states (Tensor) – Input tensor to the forward method.
- Returns:
Output tensor after applying dense, gated activation function.
- Return type:
Tensor
- training: bool#
- class botiverse.models.T5Model.T5Model.LayerNormModule(layer_size=768, eps=1e-06)[source]#
Bases:
ModuleA class used to apply the Layer Normalization operation.
Initializes the Layer Normalization Module class with the given parameters.
- Parameters:
layer_size (int, optional) – Dimensions of the layers to be normalized.
eps (float, optional) – A term added to improve numerical stability.
- Returns:
None
- forward(hidden_states)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class botiverse.models.T5Model.T5Model.FFModule(dropout_rate=0.1)[source]#
Bases:
ModuleA class used to execute a Feed-Forward Neural Network module.
Initializes the FFModule class with the given parameters (the feed-forward block in the Transformer).
- Parameters:
hidden_states (Tensor) – Input tensor to the forward method.
- Returns:
Output tensor after applying the FFN.
- Return type:
Tensor
- forward(hidden_states)[source]#
Perform the forward pass of the Feed-forward Neural Network module.
- Parameters:
hidden_states (Tensor) – Input tensor to the Feed Forward Network (FFN).
- Returns:
Output tensor after applying the FFN.
- Return type:
Tensor
- training: bool#
- class botiverse.models.T5Model.T5Model.SelfAttentionModule(is_decoder, dropout_rate=0.1, has_positional_encoding=False)[source]#
Bases:
ModuleA class used to implement a self-attention mechanism.
Initializes the SelfAttentionModule class with the given parameters.
- Parameters:
is_decoder (bool) – Indicates if we are using a decoder (the decoder and encoder has differant ways to hadle the positional encoding).
dropout_rate (float, optional) – Dropout rate.
has_positional_encoding (bool, optional) – If positional encoding is applied.
- Returns:
None
- forward(hidden_states, attention_mask=None, position_bias=None)[source]#
Applies the self-attention to the hidden states.
- Parameters:
hidden_states (Tensor) – Tensor of the Query, Key and Value (all have the same inpus as this is self attention).
attention_mask (Tensor, optional) – Attention mask for the self-attention mechanism.
position_bias (Tensor, optional) – Position bias for self-attention.
- Returns:
Returns the hidden states and position bias.
- Return type:
Tuple[Tensor, Tensor]
- training: bool#
- class botiverse.models.T5Model.T5Model.EncoderBlock(has_positional_encoding=False)[source]#
Bases:
ModuleA class used for the encoder block of the transformer model.
Initializes the EncoderBlock class with the given parameters.
- Parameters:
has_positional_encoding (bool, optional) – If positional encoding is applied.
- Returns:
None
- forward(hidden_states, attention_mask=None, position_bias=None)[source]#
Encoder block forward pass.
- Parameters:
hidden_states (Tensor) – Input tensor to the Encoder block.
attention_mask (Tensor, optional) – Attention mask for the self-attention mechanism.
position_bias (Tensor, optional) – Position bias for self-attention.
- Returns:
Returns the hidden states and position bias.
- Return type:
Tuple[Tensor, Tensor]
- training: bool#
- class botiverse.models.T5Model.T5Model.EncoderModule(embed_tokens, num_layers=12, dropout_rate=0.1)[source]#
Bases:
ModuleA class used for the encoder of the transformer model.
Initializes the EncoderModule class with the given parameters.
- Parameters:
embed_tokens (nn.Embedding) – The embeddings of the input tokens.
num_layers (int, optional) – The number of encoder layers in the model.
dropout_rate (float, optional) – The dropout rate.
- Returns:
None
- forward(input_ids=None, attention_mask=None)[source]#
Performs the forward pass of the encoder module.
- Parameters:
input_ids (Tensor, optional) – The indices of the input sequence tokens.
attention_mask (Tensor, optional) – The binary mask indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
- Returns:
The encoded hidden states.
- Return type:
Tensor
- training: bool#
- class botiverse.models.T5Model.T5Model.CrossAttentionModule(dropout_rate=0.1)[source]#
Bases:
ModuleA class used for the cross-attention module of the transformer model.
Initializes the CrossAttentionModule class with the given parameters.
- Parameters:
dropout_rate (float, optional) – Dropout rate.
- Returns:
None
- forward(hidden_states, key_value_states, encoder_attention_mask=None)[source]#
Applies cross-attention where the query comes from the hidden states and the key and value come from key_value_states.
- Parameters:
hidden_states (Tensor) – Input tensor to be used for the query.
key_value_states (Tensor) – Input tensor to be used for the key and value.
encoder_attention_mask (Tensor, optional) – Attention mask for the cross-attention mechanism.
- Returns:
Returns the hidden states and position bias.
- Return type:
Tuple[Tensor, Tensor]
- training: bool#
- class botiverse.models.T5Model.T5Model.DecoderBlock(has_positional_encoding=False)[source]#
Bases:
ModuleA class used for the decoder block of the transformer model.
Initializes the DecoderModule class with the given parameters.
- Parameters:
has_positional_encoding (bool, optional) – If positional encoding is applied.
- Returns:
None
- forward(hidden_states, attention_mask=None, position_bias=None, encoder_hidden_states=None, encoder_attention_mask=None)[source]#
Performs the forward pass of the decoder block.
- Parameters:
hidden_states (Tensor) – The hidden states from the previous decoder block (or the input embeddings if this is the first decoder block).
attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
position_bias (Tensor, optional) – The positional bias for self-attention mechanism.
encoder_hidden_states (Tensor, optional) – The output hidden states from the encoder module.
encoder_attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating where the input has been padded.
- Returns:
The hidden states and position bias after the forward pass of the decoder block.
- Return type:
Tuple[Tensor, Tensor]
- training: bool#
- class botiverse.models.T5Model.T5Model.DecoderModule(embed_tokens, num_layers=12, dropout_rate=0.1)[source]#
Bases:
ModuleA class used to implement the decoder part of the transformer.
Initializes the DecoderModule class with the given parameters.
- Parameters:
embed_tokens (nn.Embedding) – The embeddings of the decoder input sequance tokens.
num_layers (int, optional) – The number of decoder layers in the model.
dropout_rate (float, optional) – The dropout rate.
- Returns:
None
- forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None)[source]#
Performs the forward pass of the decoder.
- Parameters:
input_ids (Tensor, optional) – The indices of the input sequence tokens in the vocabulary.
attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
encoder_hidden_states (Tensor, optional) – the output of the encoder module.
encoder_attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
- Returns:
Decoded output hidden states.
- Return type:
Tensor
- training: bool#
- class botiverse.models.T5Model.T5Model.T5Model(vocab_size=32128, d_model=768)[source]#
Bases:
ModuleA class to represent the T5 transformer model.
Initializes the T5 Model with the given parameters.
- Parameters:
vocab_size (int, optional) – The size of the vocabulary.
d_model (int, optional) – The dimensionality of the input embedding.
- Returns:
None
- forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None)[source]#
Performs the forward pass of the T5 transformer model.
- Parameters:
input_ids (Tensor, optional) – The IDs of the input tokens.
attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
decoder_input_ids (Tensor, optional) – The ids of the decoder input tokens.
decoder_attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
- Returns:
The token probabilities of the output sequence.
- Return type:
Tensor
- generate(input_ids=None, attention_mask=None, max_length=10, temperature=1.0)[source]#
Generates output sequence given input_ids and attention_mask.
- Parameters:
input_ids (Tensor, optional) – The IDs of the input tokens.
attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
max_length (int, optional) – The maximum length of the sequence to be generated.
temperature (float, optional) – The temperature of the softmax function, the higher its value the flatter the probability distribution of the next token will be.
- Returns:
The IDs of the generated tokens.
- Return type:
Tensor
- training: bool#