TTS
This folder contains recipes for training a Text-To-Speech Synthesis (TTS) model.
👆Back to the recipe README.md
Table of Contents
Available Backbones
Below is a table of available backbones:
| Dataset | Subset | Configuration | Audio Samples Link |
|---|---|---|---|
| libritts | train-clean-100 | ||
| train-clean-460 | |||
| train-960 | |||
| ljspeech | train | 22.05khz_mfa_fastspeech2 | |
| train | 22.05khz_mfa_fastspeech2_nopunc | ||
| vctk |
👆Back to the table of contents
Preparing Durations for FastSpeech2
For training a FastSpeech2 model, you need to acquire additional duration data for your target dataset.
Follow these steps:
1. Create a virtual environment for MFA: conda create -n speechain_mfa -c conda-forge montreal-forced-aligner gdown.
2. Activate the speechain_mfa environment: conda activate speechain_mfa.
3. Downsample your target TTS dataset to 16khz. For details, please see how to dump a dataset on your machine.
4. By default, MFA package will store all the temporary files to your user directory. If you lack sufficient space, add export MFA_ROOT_DIR={your-target-directory} to ~/.bashrc and run source ~/.bashrc.
5. Navigate to ${SPEECHAIN_ROOT}/datasets and run bash mfa_preparation.sh -h for help. Then, add appropriate arguments to bash mfa_preparation.sh to acquire duration data.
Note: MFA cannot process duration calculations for multiple datasets concurrently on a single machine (or a single node on a cluster). Please process each dataset one at a time.
👆Back to the table of contents
Training an TTS model
To train a TTS model, follow the ASR model training instructions located in recipes/asr.
Make sure to replace the folder names and configuration file names from recipes/asr with their corresponding names in recipes/tts.