An adaptation of Reformer: The Efficient Transformer for text-to-speech task.
This project contains:
- preprocessing code for creating a Trump Speech Dataset based on transcripts from rev.com
- implementation of Reformer TTS: an adaptation of Reformer: The Efficient Transformer for text-to-speech task, based on Neural Speech Synthesis with Transformer Network
- implementation of Squeezewave: Extremely Lightweight Vocoders For On-Device Speech Synthesis in modern PyTorch, without dependencies on Tacotron2, WaveNet or WaveGlow
- Pytorch Lightning wrappers for easy training of both models with easy-to-use configuration management
- CLI for running training, inference and data preprocessing
We aimed to create a significantly more efficient version of state-of-the-art text-to-speech model, by replacing its transformer architecture with optimizations proposed in the more recent reformer paper. We’ll use it to generate a believable deepfake of Donald Trump based on a custom dataset of his speeches, created specifically for this purpose.
Unfortunately, we weren't able to produce results matching the ones from Transformer TTS paper, after experimenting with more than 100 hyperparameter combinations over 2 months. We believe that the model size is a significant factor here, and to train transformers for TTS one really needs to reduce overfitting to allow long, steady training process (~1 week of training on RTX 2080Ti).
Also, having access to original implementation of Transformer TTS would greatly help.
While the reformer didn't match our expectations, the SqueezeWave implementation matches performance of the original one without FP16 support.
We also include CLI for running training and inference (see usage section), and all data necessary for reproduction of experiments (see development section).
The project is under a significant refactor, this version is left here to allow compatiblility with our previous expeirments and will be moved in the near future.
This project is a normal python package, and can be installed using pip
,
as long as you have Python 3.8 or greater.
Go to releases page to find the installation instruction for latest release.
After installation, you can see available commands by running:
python -m reformer_tts.cli --help
All commands are executed using cli, for example:
python -m reformer_tts.cli train-vocoder
Most parameters (in particular, all training hyperparameters) are specified via
--config
argument to cli
(that goes before the command you want to run), eg:
python -m reformer_tts.cli -c /path/to/your/config.yml train-vocoder
Default values can be found in reformer_tts.config.Config
(and its fields).
Thanks to conda-forge community, we can install all packages (including necessary
binaries, like ffmpeg
) using one command.
conda env create -f environment.yml
- Check your environment and ensure you have
Python>=3.8
:
which python
python --version
- Install python dependencies (also installs our package in editable mode):
pip install -r requirements.txt
-
Ensure you have
ffmpeg>=3.4,<4.0
installed (installation instructions) -
For training, ensure you have CUDA and GPU drivers installed (for details, see instructions on PyTorch website)
- In order for dvc to have write access to the remote, configure your gcp account (using credentials from the generated json file):
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-credentials.json
NOTE: if you only need read acces (for reproduction), you don't need to perform step 1
- Get all of the data - this step needs to be repeated:
- every time you start working after a break
- after every git pull
- after checking out another git branch
dvc pull
To do this you can run project tests:
python -m pytest --pyargs reformer_tts
All tests should work on CPU and GPU, and may take up to a minute to complete.
Remember to pass --pyargs reformer_tts
to pytest, otherwise it will search data directories for tests
- Use whatever package manager you want
- Use
Python>=3.8
- All python dependencies will be in
requirements.txt
as well as inenvironment.yml
- One central entrypoint for running tasks:
reformer_tts/cli.py
, runpython reformer_tts/cli.py --help
for detailed reference
Configuration is organized in dataclass structures:
- Each project submodule has its own configuration file, called
config.py
, where the parameters and default values are defined - for example, dataset config parameters are specified inreformer_tts.dataset.config
- The
reformer_tts.config.Config
class contains all submodules' config settings - Actual values of config parameters are loaded from configuration files in yaml format, best practice is to only override defaults in the yaml files
This way, the default values are set close to the place where they are used, any config value can be overridden wherever you want
To change runtime configuration
- automatically generate configuration with default values using command
python reformer_tts/cli.py save-config -o config/custom.yml
or manually copy one of the existing configuration files inconfig/
directory - remove defaults you don't wish to change from the generated config file
- change values you wish to change in the generated config file
- specify your config when running cli scripts using
-c
option, ie:python reformer_tts/cli.py -c config/custom.yml [COMMAND]
To add configuration for new module
- create
config.py
in your module - define a dataclass with all necessary config parameters in the new file:
- make sure your class does not re-define parameter values for other config files
(ie. we specified number of spectrogram channels only once - in the same place
for both
dataset
andsqueezewave
modules) - make sure your class has default values for all the parameters
- make sure your class does not re-define parameter values for other config files
(ie. we specified number of spectrogram channels only once - in the same place
for both
- add field for your dataclass in the
reformer_tts.config
main config class
We use DVC for defining data processing pipelines.
Remote is set up on Google Cloud Storage, for details run dvc config list
.
Setup for running jobs on entropy cluster
Nodes prepared for running:
- asusgpu3
- asusgpu4
- asusgpu1
- arnold
- sylvester
- Clone repo to your homedir
- Make sure dataset path is configured in
/scidatalg
- Setup command to call file from your homedir
- Commit your changes
- Run sbatch script
Before runing:
- chose node from already prepared or prepare new one using instructions below
- copy repository to your home dir
- make sure NEPTUNE API TOKEN is set in your environment
To run training:
- prepare training config and push it onto remote repository
- login to chosen node using interactive session
srun --qos=gsn --partition=common --nodelist=<name_of_chosen_node> --pty /bin/bash
- goto
/scidatalg/reformer-tts/reformer-tts/
make sure repository is pulled and on proper branch - log back to login node
- copy and modify
jobs/train_entropy.sbatch
- fill node name and training command - run
sbatch your/job/script/location.sbatch
Pro Tip watch -n 1 squeue -u your_username
to watch if your job is already running
Pro Tip2 You can watch the updates to the log by running tail -f file.log
or less --follow-name +F file.log
To pull from dvc use jobs/entropy_dvc_pull.sbatch
.
- copy this file
- fill node name
- adjust dvc command
- run job using sbatch
Since /scidatasm directory is not syncing while we want to train we have to setup training on each node separately by hand. To setup env on new node follow this instuctions:
Note: only nodes with /scidatalg are supported by this scripts. These nodes are: asusgpu4, asusgpu3, asusgpu2, asusgpu1, arnold, sylvester
- login to node using interactive session
srun --qos=gsn --partition=common --nodelist=<name_of_chosen_node> --pty /bin/bash
- copy google api credentials to
${HOME}/gcp-cred.json
(using your favourite editor) - copy the content of
scripts/setup_entropy_node.sh
to new file in home dir (again using editor) - run copied script