botiverse.models.T5Model package

botiverse.models.T5Model package#

Submodules#

botiverse.models.T5Model.T5Model module#

class botiverse.models.T5Model.T5Model.AttentionModule(is_decoder=False, num_positional_encoding_buckets=32, positional_encoding_max_distance=128, d_model=768, num_heads=12, dropout_rate=0.1, has_positional_encoding=False)[source]#

Bases: Module

A class used for implementing the general transformer attention mechanism.

Constructs an AttentionModule instance with specific hyperparameters.

Parameters:

is_decoder (bool, optional) – Indicates if we are using a decoder.
num_positional_encoding_buckets (int, optional) – Number of positional encoding buckets.
positional_encoding_max_distance (int, optional) – Max distance for positional encoding.
d_model (int, optional) – Indicates the model embeddings dimension.
num_heads (int, optional) – States the number of attention heads.
dropout_rate (float, optional) – Dropout rate.
has_positional_encoding (bool, optional) – If positional encoding is applied.

Returns:

None

relative_positional_encoding(relative_position, bidirectional=True, num_buckets=32, max_distance=128)[source]#

Provides the buckets given the relative positions.

Parameters:

relative_position (Tensor) – Tensor of relative positions.
bidirectional (bool, optional) – If the attention is bidirectional, is false in the decoder as the token can attend only to the tokens behid it.
num_buckets (int, optional) – Number of buckets for positional encoding.
max_distance (int, optional) – Maximum distance for positional encoding.

Returns:

Relative buckets.

Return type:

Tensor

compute_bias(query_length, key_length)[source]#

Computes the the relative positional bias between the queries and the keys.

Parameters:

query_length (int) – Length of the query sequance.
key_length (int) – Length of the key sequance.

Returns:

Positional embeddings.

Return type:

Tensor

forward(hidden_states, mask=None, key_value_states=None, position_bias=None)[source]#

The forward pass of the attention layer.

Parameters:

hidden_states (Tensor) – Tensor of the Query.
mask (Tensor, optional) – Mask to be applied on values.
key_value_states (Tensor, optional) – Tensor of the Key and Value, the default is the same as hidden_states.
position_bias (Tensor, optional) – Positional bias to be added.

Returns:

Returns the attention output and positional bias.

Return type:

Tuple[Tensor, Tensor]

training: bool#

class botiverse.models.T5Model.T5Model.NewGELUActivation(*args, **kwargs)[source]#

Bases: Module

Simple interface of the Gaussian Error Linear Units (GELU) activation function

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#

class botiverse.models.T5Model.T5Model.DenseGatedActDenseModule(d_model=768, d_ff=2048, dropout_rate=0.1)[source]#

Bases: Module

A class used to implement a dense, gated activation function.

Initializes the DenseGatedActDense Module class with the given parameters which is a gated dense layer followed a dense layer.

Parameters:

d_model (int, optional) – Input dimension to the module (and also the model embedding dimension).
d_ff (int, optional) – Hidden layer dimension.
dropout_rate (float, optional) – Dropout rate.

Returns:

None

forward(hidden_states)[source]#

Performs the forward pass of the dense, gated activation function.

Parameters:: hidden_states (Tensor) – Input tensor to the forward method.
Returns:: Output tensor after applying dense, gated activation function.
Return type:: Tensor

training: bool#

class botiverse.models.T5Model.T5Model.LayerNormModule(layer_size=768, eps=1e-06)[source]#

Bases: Module

A class used to apply the Layer Normalization operation.

Initializes the Layer Normalization Module class with the given parameters.

Parameters:

layer_size (int, optional) – Dimensions of the layers to be normalized.
eps (float, optional) – A term added to improve numerical stability.

Returns:

None

forward(hidden_states)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class botiverse.models.T5Model.T5Model.FFModule(dropout_rate=0.1)[source]#

Bases: Module

A class used to execute a Feed-Forward Neural Network module.

Initializes the FFModule class with the given parameters (the feed-forward block in the Transformer).

Parameters:: hidden_states (Tensor) – Input tensor to the forward method.
Returns:: Output tensor after applying the FFN.
Return type:: Tensor

forward(hidden_states)[source]#

Perform the forward pass of the Feed-forward Neural Network module.

Parameters:: hidden_states (Tensor) – Input tensor to the Feed Forward Network (FFN).
Returns:: Output tensor after applying the FFN.
Return type:: Tensor

training: bool#

class botiverse.models.T5Model.T5Model.SelfAttentionModule(is_decoder, dropout_rate=0.1, has_positional_encoding=False)[source]#

Bases: Module

A class used to implement a self-attention mechanism.

Initializes the SelfAttentionModule class with the given parameters.

Parameters:

is_decoder (bool) – Indicates if we are using a decoder (the decoder and encoder has differant ways to hadle the positional encoding).
dropout_rate (float, optional) – Dropout rate.
has_positional_encoding (bool, optional) – If positional encoding is applied.

Returns:

None

forward(hidden_states, attention_mask=None, position_bias=None)[source]#

Applies the self-attention to the hidden states.

Parameters:

hidden_states (Tensor) – Tensor of the Query, Key and Value (all have the same inpus as this is self attention).
attention_mask (Tensor, optional) – Attention mask for the self-attention mechanism.
position_bias (Tensor, optional) – Position bias for self-attention.

Returns:

Returns the hidden states and position bias.

Return type:

Tuple[Tensor, Tensor]

training: bool#

class botiverse.models.T5Model.T5Model.EncoderBlock(has_positional_encoding=False)[source]#

Bases: Module

A class used for the encoder block of the transformer model.

Initializes the EncoderBlock class with the given parameters.

Parameters:: has_positional_encoding (bool, optional) – If positional encoding is applied.
Returns:: None

forward(hidden_states, attention_mask=None, position_bias=None)[source]#

Encoder block forward pass.

Parameters:

hidden_states (Tensor) – Input tensor to the Encoder block.
attention_mask (Tensor, optional) – Attention mask for the self-attention mechanism.
position_bias (Tensor, optional) – Position bias for self-attention.

Returns:

Returns the hidden states and position bias.

Return type:

Tuple[Tensor, Tensor]

training: bool#

class botiverse.models.T5Model.T5Model.EncoderModule(embed_tokens, num_layers=12, dropout_rate=0.1)[source]#

Bases: Module

A class used for the encoder of the transformer model.

Initializes the EncoderModule class with the given parameters.

Parameters:

embed_tokens (nn.Embedding) – The embeddings of the input tokens.
num_layers (int, optional) – The number of encoder layers in the model.
dropout_rate (float, optional) – The dropout rate.

Returns:

None

forward(input_ids=None, attention_mask=None)[source]#

Performs the forward pass of the encoder module.

Parameters:

input_ids (Tensor, optional) – The indices of the input sequence tokens.
attention_mask (Tensor, optional) – The binary mask indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).

Returns:

The encoded hidden states.

Return type:

Tensor

training: bool#

class botiverse.models.T5Model.T5Model.CrossAttentionModule(dropout_rate=0.1)[source]#

Bases: Module

A class used for the cross-attention module of the transformer model.

Initializes the CrossAttentionModule class with the given parameters.

Parameters:: dropout_rate (float, optional) – Dropout rate.
Returns:: None

forward(hidden_states, key_value_states, encoder_attention_mask=None)[source]#

Applies cross-attention where the query comes from the hidden states and the key and value come from key_value_states.

Parameters:

hidden_states (Tensor) – Input tensor to be used for the query.
key_value_states (Tensor) – Input tensor to be used for the key and value.
encoder_attention_mask (Tensor, optional) – Attention mask for the cross-attention mechanism.

Returns:

Returns the hidden states and position bias.

Return type:

Tuple[Tensor, Tensor]

training: bool#

class botiverse.models.T5Model.T5Model.DecoderBlock(has_positional_encoding=False)[source]#

Bases: Module

A class used for the decoder block of the transformer model.

Initializes the DecoderModule class with the given parameters.

Parameters:: has_positional_encoding (bool, optional) – If positional encoding is applied.
Returns:: None

forward(hidden_states, attention_mask=None, position_bias=None, encoder_hidden_states=None, encoder_attention_mask=None)[source]#

Performs the forward pass of the decoder block.

Parameters:

hidden_states (Tensor) – The hidden states from the previous decoder block (or the input embeddings if this is the first decoder block).
attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
position_bias (Tensor, optional) – The positional bias for self-attention mechanism.
encoder_hidden_states (Tensor, optional) – The output hidden states from the encoder module.
encoder_attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating where the input has been padded.

Returns:

The hidden states and position bias after the forward pass of the decoder block.

Return type:

Tuple[Tensor, Tensor]

training: bool#

class botiverse.models.T5Model.T5Model.DecoderModule(embed_tokens, num_layers=12, dropout_rate=0.1)[source]#

Bases: Module

A class used to implement the decoder part of the transformer.

Initializes the DecoderModule class with the given parameters.

Parameters:

embed_tokens (nn.Embedding) – The embeddings of the decoder input sequance tokens.
num_layers (int, optional) – The number of decoder layers in the model.
dropout_rate (float, optional) – The dropout rate.

Returns:

None

forward(input_ids=None, attention_mask=None, encoder_hidden_states=None, encoder_attention_mask=None)[source]#

Performs the forward pass of the decoder.

Parameters:

input_ids (Tensor, optional) – The indices of the input sequence tokens in the vocabulary.
attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
encoder_hidden_states (Tensor, optional) – the output of the encoder module.
encoder_attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).

Returns:

Decoded output hidden states.

Return type:

Tensor

training: bool#

class botiverse.models.T5Model.T5Model.T5Model(vocab_size=32128, d_model=768)[source]#

Bases: Module

A class to represent the T5 transformer model.

Initializes the T5 Model with the given parameters.

Parameters:

vocab_size (int, optional) – The size of the vocabulary.
d_model (int, optional) – The dimensionality of the input embedding.

Returns:

None

forward(input_ids=None, attention_mask=None, decoder_input_ids=None, decoder_attention_mask=None)[source]#

Performs the forward pass of the T5 transformer model.

Parameters:

input_ids (Tensor, optional) – The IDs of the input tokens.
attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
decoder_input_ids (Tensor, optional) – The ids of the decoder input tokens.
decoder_attention_mask (Tensor, optional) – The binary mask of the decoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).

Returns:

The token probabilities of the output sequence.

Return type:

Tensor

generate(input_ids=None, attention_mask=None, max_length=10, temperature=1.0)[source]#

Generates output sequence given input_ids and attention_mask.

Parameters:

input_ids (Tensor, optional) – The IDs of the input tokens.
attention_mask (Tensor, optional) – The binary mask of the encoder sequance, indicating the positions where the input sequence is padded (1 for not padded, 0 for padded).
max_length (int, optional) – The maximum length of the sequence to be generated.
temperature (float, optional) – The temperature of the softmax function, the higher its value the flatter the probability distribution of the next token will be.

Returns:

The IDs of the generated tokens.

Return type:

Tensor

training: bool#

botiverse.models.T5Model package

Contents

botiverse.models.T5Model package#

Submodules#

botiverse.models.T5Model.T5Model module#

Module contents#