TTS
This folder contains recipes for training a Text-To-Speech Synthesis (TTS) model.
👆Back to the recipe README.md
Table of Contents
Available Backbones
Below is a table of available backbones:
Dataset | Subset | Configuration | Audio Samples Link |
---|---|---|---|
libritts | train-clean-100 | ||
train-clean-460 | |||
train-960 | |||
ljspeech | train | 22.05khz_mfa_fastspeech2 | |
train | 22.05khz_mfa_fastspeech2_nopunc | ||
vctk |
👆Back to the table of contents
Preparing Durations for FastSpeech2
For training a FastSpeech2 model, you need to acquire additional duration data for your target dataset.
Follow these steps:
1. Create a virtual environment for MFA: conda create -n speechain_mfa -c conda-forge montreal-forced-aligner gdown
.
2. Activate the speechain_mfa
environment: conda activate speechain_mfa
.
3. Downsample your target TTS dataset to 16khz. For details, please see how to dump a dataset on your machine.
4. By default, MFA package will store all the temporary files to your user directory. If you lack sufficient space, add export MFA_ROOT_DIR={your-target-directory}
to ~/.bashrc
and run source ~/.bashrc
.
5. Navigate to ${SPEECHAIN_ROOT}/datasets
and run bash mfa_preparation.sh -h
for help. Then, add appropriate arguments to bash mfa_preparation.sh
to acquire duration data.
Note: MFA cannot process duration calculations for multiple datasets concurrently on a single machine (or a single node on a cluster). Please process each dataset one at a time.
👆Back to the table of contents
Training an TTS model
To train a TTS model, follow the ASR model training instructions located in recipes/asr
.
Make sure to replace the folder names and configuration file names from recipes/asr
with their corresponding names in recipes/tts
.