WaveRNN-Pytorch

This repository contains Fatcord's Alternative WaveRNN (Faster training), which contains a fast-training, small GPU memory implementation of WaveRNN vocoder.

Model Pruning and Real Time CPU Inference

See geneing's awesome fork that has model pruning, export to C++ and real time inference on CPU: https://github.com/geneing/WaveRNN-Pytorch.

Highlights

support raw audio wav modelling (via a single Beta Distribution)
relatively fast synthesis speed without much optimization yet (around 2000 samples/sec on GTX 1060 Ti, 16 GB ram, i5 processor)
support Fatcord's original quantized (9-bit) wav modelling

Audio Samples

Obama & Bernie Sanders See this repo in action!
10-bit audio on held-out testing data from LJSpeech. This model sounds and trains pretty close to 9 bit. We want the higher bit the better.
9-bit audio on held-out testing data from LJSpeech. This model trains the fastest (this is around 130 epochs)
Single beta distribution on held-out testing data from LjSpeech. This is trained with the single Beta distribution.

Pretrained Checkpoints

Single Beta Distribution trained for 112k. Make sure to change hparams.input_type to raw.
9-bit quantized audio trained for 11k, or around 130 epochs, can be trained further. Make sure to change hparams.input_type to bits.
10-bit quantized audio. To ensure your model is built properly, download the hparams.py here, either replace this with your local hparams.py file or note and update any changes.

Requirements

Python 3
CUDA >=8.0
PyTorch >= v0.4.1

Installation

Ensure above requirements are met.

git clone https://github.com/G-Wang/WaveRNN-Pytorch.git
cd WaveRNN-Pytorch
pip install -r requirements.txt

Usage

1. Adjusting Hyperparameters

Before running scripts, one can adjust hyperparameters in hparams.py.

Some hyperparameters that you might want to adjust:

fix_learning_rate The model is robust enough to learn well with a fix learning rate of 1e-4, I suggest you try this setting for fastest training, you can decrease this down to 5e-6 for final step refinement. Set this to None to train with learning rate schedule instead
input_type (best performing ones are currently bits and raw, see hparams.py for more details)
batch_size
save_every_step (checkpoint saving frequency)
evaluate_every_step (evaluation frequency)
seq_len_factor (sequence length of training audio, the longer the more GPU it takes)

2. Preprocessing

This function processes raw wav files into corresponding mel-spectrogram and wav files according to the audio processing hyperparameters.

Example usage:

python preprocess.py /path/to/my/wav/files

This will process all the .wav files in the folder /path/to/my/wav/files and save them in the default local directory called data_dir.

Can include --output_dir to specify a specific directory to store the processed outputs.

3. Training

Start training process. checkpoints are by default stored in the local directory checkpoints. The script will automatically save a checkpoint when terminated by crtl + c.

Example 1: starting a new model for training

python train.py data_dir

data_dir is the directory containing the processed files.

Example 2: Restoring training from checkpoint

python train.py data_dir --checkpoint=checkpoints/checkpoint0010000.pth

Evaluation .wav files and plots are saved in checkpoints/eval.

WIP

optimize learning rate schedule
optimize training hyperparameters (seq_len and batch_size)
batch generation for synthesis speedup
model pruning

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
inputs		inputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio.py		audio.py
dataset.py		dataset.py
distributions.py		distributions.py
hparams.py		hparams.py
loss_function.py		loss_function.py
lrschedule.py		lrschedule.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WaveRNN-Pytorch

Model Pruning and Real Time CPU Inference

Highlights

Audio Samples

Pretrained Checkpoints

Requirements

Installation

Usage

1. Adjusting Hyperparameters

2. Preprocessing

3. Training

WIP

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

G-Wang/WaveRNN-Pytorch

Folders and files

Latest commit

History

Repository files navigation

WaveRNN-Pytorch

Model Pruning and Real Time CPU Inference

Highlights

Audio Samples

Pretrained Checkpoints

Requirements

Installation

Usage

1. Adjusting Hyperparameters

2. Preprocessing

3. Training

WIP

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages