Module
Module inherits torch.nn.Module
and it is the base class for all Module objects in this toolkit.
The neural network parts of all the Model objects in this toolkit are constructed by multiple Module objects in a nested structure.
Below is the nested Module tree of an encoder-decoder ASR model:
ASR (Model)
---> ASREncoder (Module)
---> Speech2MelSpec (Module)
---> Speech2LinearSpec (Module)
---> LinearSpec2MelSpec (Module)
---> Conv2dPrenet (Module)
---> LinearPrenet (Module)
---> TransformerEncoder (Module)
---> PositionalEncoding (Module)
---> MultiHeadedAttention (Module)
---> PositionwiseFeedForward (Module)
---> ASRDecoder (Module)
---> EmbedPrenet (Module)
---> TransformerDecoder (Module)
---> PositionalEncoding (Module)
---> MultiHeadedAttention (Module)
---> PositionwiseFeedForward (Module)
---> TokenPostnet (Module)
module_init()
for module initialization and forward()
for output calculation.
Table of Contents
Module Library
/speechain
/module
/abs.py # Abstract class of Module. Base of all Module implementations.
/frontend # Acoustic feature extraction frontend modules
/speech2linear.py # Module implementation of speech-to-linear frontend. Used to transform the input speech waveforms into linear spectrogram.
/linear2mel.py # Module implementation of linear-to-mel frontend. Used to transform the input linear spectrogram into log-mel spectrogram.
/speech2mel.py # Module implementation of speech-to-mel frontend. Used to transform the input speech waveforms into log-mel spectrogram.
/delta_feat.py # Module implementation of delta frontend. Mainly used for ASR training when we want to take the first and second derivatives of log-mel spectrogram.
/norm # Normalization modules
/feat_norm.py # Module implementation of per-channel feature normalization.
/augment # Data augmentation modules
/specaug.py # Module implementation of SpecAugment. Mainly used for ASR training.
/encoder # Model encoder modules
/asr.py # Module implementation of ASR encoders. Used for ASR model construction.
/tts.py # Module implementation of TTS encoders. Used for TTS model construction.
/decoder # Model decoder modules
/asr.py # Module implementation of ASR autoregressive decoders. Used for autoregressive ASR model construction.
/tts.py # Module implementation of TTS autoregressive decoders. Used for autoregressive TTS model construction.
/prenet # Model prenet modules in front of encoders and decoders
/conv1d.py # Module implementation of 1D Convolutional prenet.
/conv2d.py # Module implementation of 2D Convolutional prenet.
/embed.py # Module implementation of token embedding prenet.
/linear.py # Module implementation of stacked linear prenet.
/spk_embed.py # Module implementation of speaker embedding prenet.
/postnet # Model postnet modules behind encoders and decoders
/conv1d.py # Module implementation of 1D Convolutional postnet.
/token.py # Module implementation of token prediction postnet.
/transformer # Transformer-related modules
/encoder.py # Module implementation of Transformer encoder layers. Used for decoder construction of ASR and TTS models.
/decoder.py # Module implementation of Transformer autoregressive decoder layers. Used for decoder construction of autoregressive ASR and TTS models.
/pos_enc.py # Module implementation of positional encoding layers.
/attention.py # Module implementation of multi-head attention layers.
/feed_forward.py # Module implementation of point-wise feed-forward layers.
👆Back to the table of contents
API Document
Non-overridable backbone functions:
1. speechain.module.abs.Module.__init__
Overridable interface functions:
1. speechain.module.abs.Module.module_init
2. speechain.module.abs.Module.forward
3. speechain.module.abs.Module.recover
4. speechain.module.abs.Module.reset_parameters
5. speechain.module.abs.Module.get_recordable_para
speechain.module.abs.Module.__init__(self, input_size, distributed, **module_conf)
- Description:
This initialization function is shared by all Module subclasses. There are two built-in variable members:input_size
andoutput_size
.input_size
is the last dimension of the input tensor whileoutput_size
is the last dimension of the output tensor.
These two member variables serve as the socket and plug that are used to communicate with the front and back Module objects in a Model object. You could utilizeself.input_size
in yourmodule_init()
implement to initialize your module and give the output data dimension toself.output_size
.
Note: The usage of these two member variables is not mandatory, but it would be a convenient way for you to initialize your module. - Arguments:
- input_size: int = None
The last dimension of the tensor from the front Module object. If not given, this argument would be None. - distributed: bool = False
Whether the Model object this Module object is belong to is distributed to multiple GPUs. - **module_conf:
The arguments used bymodule_init()
for your customized Module initialization.
- input_size: int = None
speechain.module.abs.Module.module_init(self, **module_conf)
- Description:
Abstract interface function for customized initialization of each Module subclass. This interface function is mandatory to be overridden by your implementation. - Arguments:
- **module_conf:
The arguments used for customized Module initialization. For more details, please refer to the docstring of your target Module subclass.
- **module_conf:
speechain.module.abs.Module.forward(self, **kwargs)
- Description:
This abstract interface function is the customized implementation oftorch.nn.Module.forward()
used during model forward calculation. This interface function is mandatory to be overridden by your implementation. - Arguments:
- **kwargs:
The input arguments for module forward calculation.
For more details, please refer to the docstring offorward()
of your target Module subclass.
- **kwargs:
- Return:
Module forward calculation results.
For more details, please refer to the docstring offorward()
of your target Module subclass.
speechain.module.abs.Module.recover(self, **kwargs)
- Description:
This abstract interface function is used to recover the module forward calculation results back to the input data. It can be considered as the reverse process offorward()
.
This interface function is not mandatory to be overridden. - Arguments:
- **kwargs:
The input forward calculation results to be recovered. For more details, please refer to the docstring ofrecover()
of your target Module subclass.
- **kwargs:
- Return:
The recovered data or closely-recovered data (sometimesforward()
may not be totally recoverable).
For more details, please refer to the docstring ofrecover()
of your target Module subclass.
speechain.module.abs.Module.reset_parameters(self)
- Description:
This abstract interface function is used to initialize the customized parameters in the Module subclass if had.
Some Module subclasses have their customized parameters with specific initialization functions.
If your Module implementation has some customized parameters and you want to initialize them by yourself,
please give the initialization logic in this interface function.
This interface function is not mandatory to be overridden.
Note: Don't forget to addself.default_init_modules.append(YourModule)
inmodel_init()
of your Model.
speechain.module.abs.Module.get_recordable_para(self)
- Description:
This function returns the parameters of the module that you want to record as part of step information.
If you want to record the value of the customized parameters of your module:- when it is a leaf (no Module members) in the nested Module tree of the model,
please override this function and return the parameter values in a Dict.
For an example, you can refer to ${SPEECHAIN_ROOT}/speechain/module/transformer/pos_enc.py. - when it is a non-leaf (with Module members) in the nested Module tree of the model,
please follow the pseudocode below:
- when it is a leaf (no Module members) in the nested Module tree of the model,
please override this function and return the parameter values in a Dict.
- Return: Dict or None
For the leaf module, the default implementation returns None;
For the non-leaf module, the default implementation returns a Dict containing names and recordable parameters of its member modules.