Overview
A Machine Speech Chain Toolkit for ASR, TTS, and Both
SpeeChain is an open-source PyTorch-based speech and language processing toolkit produced by the AHC lab at Nara Institute of Science and Technology (NAIST). This toolkit is designed to simplify the pipeline of the research on the machine speech chain, i.e., the joint model of automatic speech recognition (ASR) and text-to-speech synthesis (TTS).
SpeeChain is currently in beta. Contribution to this toolkit is warmly welcomed anywhere, anytime!
If you find our toolkit helpful for your research, we sincerely hope that you can give us a starβ! Anytime you encounter problems when using our toolkit, please don't hesitate to leave us an issue!
Table of Contents
Machine Speech Chain
- Offline TTSβASR Chain
πBack to the table of contents
Toolkit Characteristics
- Data Processing:
- On-the-fly Log-Mel Spectrogram Extraction
- On-the-fly SpecAugment
- On-the-fly Feature Normalization
- Model Training:
- Multi-GPU Model Distribution based on torch.nn.parallel.DistributedDataParallel
- Real-time status reporting by online Tensorboard and offline Matplotlib
- Real-time learning dynamics visualization (attention visualization, spectrogram visualization)
- Data Loading:
- On-the-fly mixture of multiple datasets in a single dataloader.
- On-the-fly data selection for each dataloader to filter the undesired data samples.
- Multi-dataloader batch generation is used to form training batches using multiple datasets.
- Optimization:
- Model training can be done by multiple optimizers. Each optimizer is responsible for a specific part - model parameters.
- Gradient accumulation for mimicking the large-batch gradients by the ones on several small batches.
- Easy-to-set finetuning factor to scale down the learning rates without any modification of the scheduler configuration.
- Model Evaluation:
- Multi-level .md evaluation reports (overall-level, group-level model, and sample-level) without any - yout misplacement.
- Histogram visualization for the distribution of evaluation metrics.
- Top N bad case analysis for better model diagnosis.
πBack to the table of contents
Directory Structure
βββ config # configuration for feature extraction
βΒ Β βββ feat
βββ CONTRIBUTING.md # convention for contributor
βββ create_env.sh # bash shell to create environment
βββ datasets # dataset folder, put data here, make softlink, or set in config file
βΒ Β βββ data_dumping.sh
βΒ Β βββ librispeech
βΒ Β βββ libritts
βΒ Β βββ ljspeech
βΒ Β βββ meta_generator.py
βΒ Β βββ meta_post_processor.py
βΒ Β βββ mfa_preparation.sh
βΒ Β βββ vctk
βββ docs # folder to build docs
βββ environment.yaml
βββ LICENSE
βββ recipes # folder for experiment, you will work here
βΒ Β βββ asr
βΒ Β βββ lm
βΒ Β βββ offline_tts2asr
βΒ Β βββ run.sh
βΒ Β βββ tts
βββ requirements.txt
βββ run.sh
βββ scripts
βΒ Β βββ gen_ref_pages.py
βββ setup.py
βββ speechain # directory for speechain toolkit code
βββ criterion
βββ dataset
βββ infer_func
βββ iterator
βββ model
βββ module
βββ monitor.py
βββ optim_sche
βββ pyscripts
βββ runner.py
βββ snapshooter.py
βββ tokenizer
βββ utilbox
Quick Start
Try minilibrispeech recipe train-clean-5
in ASR.
Workflow
We recommend you first install Anaconda into your machine before using our toolkit. After the installation of Anaconda, please follow the steps below to deploy our toolkit on your machine:
- Find a path with enough disk memory space. (e.g., at least 500GB if you want to use LibriSpeech or LibriTTS datasets).
- Clone our toolkit by
git clone https://github.com/bagustris/SpeeChain.git
. - Go to the root path of our toolkit by
cd SpeeChain
. - Run
source envir_preparation.sh
to build the environment for SpeeChain toolkit.
After execution, a virtual environment namedspeechain
will be created and two environmental variablesSPEECHAIN_ROOT
andSPEECHAIN_PYTHON
will be initialized in your~/.bashrc
.
Note: It must be executed in the root pathSpeeChain
and by the commandsource
rather than./envir_preparation.sh
. - Run
conda activate speechain
in your terminal to examine the installation of Conda environment. If the environmentspeechain
is not successfully activated, please runconda env create -f environment.yaml
,conda activate speechain
andpip install -e ./
to manually install it. -
Run
echo ${SPEECHAIN_ROOT}
andecho ${SPEECHAIN_PYTHON}
in your terminal to examine the environmental variables. If either one is empty, please manually add them into your~/.bashrc
byexport SPEECHAIN_ROOT=xxx
orexport SPEECHAIN_PYTHON=xxx
and then activate them bysource ~/.bashrc
.-
SPEECHAIN_ROOT
should be the absolute path of theSpeeChain
folder you have just cloned (i.e./xxx/SpeeChain
where/xxx/
is the parent directory); -
SPEECHAIN_PYTHON
should be the absolute path of the python compiler in the folder ofspeechain
environment (i.e./xxx/anaconda3/envs/speechain/bin/python3.X
where/xxx/
is where youranaconda3
is placed andX
depends onenvironment.yaml
).
-
-
Read the handbook and start your journey in SpeeChain!