Name		Name	Last commit message	Last commit date
parent directory ..
local		local
README.md		README.md
path.sh		path.sh
run.sh		run.sh

README.md

WaveFlow with LJSpeech

Dataset

We experiment with the LJSpeech dataset. Download and unzip LJSpeech.

wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
tar xjvf LJSpeech-1.1.tar.bz2

Get Started

Assume the path to the dataset is ~/datasets/LJSpeech-1.1. Assume the path to the Tacotron2 generated mels is ../tts0/output/test. Run the command below to

source path.
preprocess the dataset.
train the model.
synthesize wavs from mels.

./run.sh

You can choose a range of stages you want to run, or set stage equal to stop-stage to use only one stage, for example, run the following command will only preprocess the dataset.

./run.sh --stage 0 --stop-stage 0

Data Preprocessing

./local/preprocess.sh ${preprocess_path}

Model Training

./local/train.sh calls ${BIN_DIR}/train.py.

CUDA_VISIBLE_DEVICES=${gpus} ./local/train.sh ${preprocess_path} ${train_output_path}

The training script requires 4 command line arguments.

--data is the path of the training dataset.
--output is the path of the output directory.
--ngpu is the number of gpus to use, if ngpu == 0, use cpu.

If you want distributed training, set a larger --ngpu (e.g. 4). Note that distributed training with cpu is not supported yet.

Synthesizing

./local/synthesize.sh calls ${BIN_DIR}/synthesize.py, which can synthesize waveform from mels.

CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize.sh ${input_mel_path} ${train_output_path} ${ckpt_name}

Synthesize waveform.

We assume the --input is a directory containing several mel spectrograms(log magnitude) in .npy format.
The output would be saved in --output directory, containing several .wav files, each with the same name as the mel spectrogram does.
--checkpoint_path should be the path of the parameter file (.pdparams) to load. Note that the extention name .pdparmas is not included here.
--ngpu is the number of gpus to use, if ngpu == 0, use cpu.

Pretrained Model

Pretrained Model with residual channel equals 128 can be downloaded here. waveflow_ljspeech_ckpt_0.3.zip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voc0

voc0

README.md

WaveFlow with LJSpeech

Dataset

Get Started

Data Preprocessing

Model Training

Synthesizing

Pretrained Model

Files

voc0

Directory actions

More options

Directory actions

More options

Latest commit

History

voc0

Folders and files

parent directory

README.md

WaveFlow with LJSpeech

Dataset

Get Started

Data Preprocessing

Model Training

Synthesizing

Pretrained Model