Overview
A Machine Speech Chain Toolkit for ASR, TTS, and Both
SpeeChain is an open-source PyTorch-based speech and language processing toolkit produced by the AHC lab at Nara Institute of Science and Technology (NAIST). This toolkit is designed to simplify the pipeline of the research on the machine speech chain, i.e., the joint model of automatic speech recognition (ASR) and text-to-speech synthesis (TTS).
SpeeChain is currently in beta. Contribution to this toolkit is warmly welcomed anywhere, anytime!
If you find our toolkit helpful for your research, we sincerely hope that you can give us a starβ! Anytime you encounter problems when using our toolkit, please don't hesitate to leave us an issue!
Table of Contents
Machine Speech Chain
- Offline TTSβASR Chain
πBack to the table of contents
Toolkit Characteristics
- Data Processing:
- On-the-fly Log-Mel Spectrogram Extraction
- On-the-fly SpecAugment
- On-the-fly Feature Normalization
- Model Training:
- Multi-GPU Model Distribution based on torch.nn.parallel.DistributedDataParallel
- Real-time status reporting by online Tensorboard and offline Matplotlib
- Real-time learning dynamics visualization (attention visualization, spectrogram visualization)
- Data Loading:
- On-the-fly mixture of multiple datasets in a single dataloader.
- On-the-fly data selection for each dataloader to filter the undesired data samples.
- Multi-dataloader batch generation is used to form training batches using multiple datasets.
- Optimization:
- Model training can be done by multiple optimizers. Each optimizer is responsible for a specific part - model parameters.
- Gradient accumulation for mimicking the large-batch gradients by the ones on several small batches.
- Easy-to-set finetuning factor to scale down the learning rates without any modification of the scheduler configuration.
- Model Evaluation:
- Multi-level .md evaluation reports (overall-level, group-level model, and sample-level) without any - yout misplacement.
- Histogram visualization for the distribution of evaluation metrics.
- Top N bad case analysis for better model diagnosis.
πBack to the table of contents
Directory Structure
βββ config # configuration for feature extraction
βΒ Β βββ feat
βββ CONTRIBUTING.md # convention for contributor
βββ create_env.sh # bash shell to create environment
βββ datasets # dataset folder, put data here, make softlink, or set in config file
βΒ Β βββ data_dumping.sh
βΒ Β βββ librispeech
βΒ Β βββ libritts
βΒ Β βββ ljspeech
βΒ Β βββ meta_generator.py
βΒ Β βββ meta_post_processor.py
βΒ Β βββ mfa_preparation.sh
βΒ Β βββ vctk
βββ docs # folder to build docs
βββ environment.yaml
βββ LICENSE
βββ recipes # folder for experiment, you will work here
βΒ Β βββ asr
βΒ Β βββ lm
βΒ Β βββ offline_tts2asr
βΒ Β βββ run.sh
βΒ Β βββ tts
βββ requirements.txt
βββ run.sh
βββ scripts
βΒ Β βββ gen_ref_pages.py
βββ setup.py
βββ speechain # directory for speechain toolkit code
βββ criterion
βββ dataset
βββ infer_func
βββ iterator
βββ model
βββ module
βββ monitor.py
βββ optim_sche
βββ pyscripts
βββ runner.py
βββ snapshooter.py
βββ tokenizer
βββ utilbox
Quick Start
Try minilibrispeech recipe train-clean-5 in ASR.
Workflow
We recommend you first install Anaconda into your machine before using our toolkit. After the installation of Anaconda, please follow the steps below to deploy our toolkit on your machine:
- Find a path with enough disk memory space. (e.g., at least 500GB if you want to use LibriSpeech or LibriTTS datasets).
- Clone our toolkit by
git clone https://github.com/bagustris/SpeeChain.git. - Go to the root path of our toolkit by
cd SpeeChain. - Run
source envir_preparation.shto build the environment for SpeeChain toolkit.
After execution, a virtual environment namedspeechainwill be created and two environmental variablesSPEECHAIN_ROOTandSPEECHAIN_PYTHONwill be initialized in your~/.bashrc.
Note: It must be executed in the root pathSpeeChainand by the commandsourcerather than./envir_preparation.sh. - Run
conda activate speechainin your terminal to examine the installation of Conda environment. If the environmentspeechainis not successfully activated, please runconda env create -f environment.yaml,conda activate speechainandpip install -e ./to manually install it. -
Run
echo ${SPEECHAIN_ROOT}andecho ${SPEECHAIN_PYTHON}in your terminal to examine the environmental variables. If either one is empty, please manually add them into your~/.bashrcbyexport SPEECHAIN_ROOT=xxxorexport SPEECHAIN_PYTHON=xxxand then activate them bysource ~/.bashrc.-
SPEECHAIN_ROOTshould be the absolute path of theSpeeChainfolder you have just cloned (i.e./xxx/SpeeChainwhere/xxx/is the parent directory); -
SPEECHAIN_PYTHONshould be the absolute path of the python compiler in the folder ofspeechainenvironment (i.e./xxx/anaconda3/envs/speechain/bin/python3.Xwhere/xxx/is where youranaconda3is placed andXdepends onenvironment.yaml).
-
-
Read the handbook and start your journey in SpeeChain!