Releases: raccoonML/Real-Time-Voice-Cloning
RTVC upstream 2/3/22 (English)
Note: Release RTVC-7 is recommended over this one. It uses the same pretrained models.
Description
This is a collection of the files currently in the original CorentinJ/Real-Time-Voice-Cloning repo, which my rtvc_upstream
branch mirrors. I provide this as a convenience to users who want a convenient way to setup the CorentinJ version of the RTVC code.
File information
- RTVC_Windows.zip contains all repo files, pretrained models and a standalone ffmpeg.exe to help load mp3 files. For the easiest setup experience on Windows, follow these instructions.
- pretrained_models.zip includes only the pretrained models. The files should be placed at this location relative to the root of the repo. Files are also available for individual download:
File location Filesize
-----------------------------------------------
saved_models/default/encoder.pt 17 MB
saved_models/default/synthesizer.pt 370 MB
saved_models/default/vocoder.pt 53 MB
Model information
The pretrained model files come from Corentin's google drive link and are identical to those provided in RTVC-7. The file checksums using sha1sum
are:
d44d60cbb47362c3c99216576ddc9796aad69366 encoder.pt
733b0c983f8a1cdaba0144d248f085d158c1775f synthesizer.pt
2ec56c93f219da3229ee40950c979e689aaa58d8 vocoder.pt
Audio samples
https://raccoonML.github.io/bluefish_experiments/RTVC-7.html
RTVC Swedish-1 (tensorflow)
This Swedish pretrained model originates from @ViktorAlm . I have assembled the files necessary to run the models.
Setup instructions
- Install Python 3.7. It needs to be this version for tensorflow 1.15 to work. It is highly recommended that you follow these instructions. GUIDE: Installing Python 3.7.9 on Windows.
- Download and extract RTVC-Swedish.zip.
- Open a Windows command prompt and set up a Python virtual environment. GUIDE: Python virtual environments in Windows
cd C:\path\to\RTVC\files
python -m venv venv
venv\Scripts\activate.bat
- Install dependencies.
pip install --upgrade pip
pip install torch -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install webrtcvad-wheels
- Start the toolbox.
python demo_toolbox.py
File information
RTVC-Swedish.zip contains all repo files, pretrained models and a standalone ffmpeg.exe to help load mp3 files.
RTVC Spanish-1
This is a Spanish model. It is experimental and there are a number of issues with it.
Known issues
The model suffers from this issue first pointed out by bluefish in CorentinJ/Real-Time-Voice-Cloning#879 (comment)
If there are problems with the synthesizer generating extra sounds, the stop threshold can be lowered to help prevent this. A threshold of 0.00001 seems to work well.
The stop threshold is left at the default 0.5 because I was unable to find a satisfactory value. Too low and there is a premature end to generation. Too high and the model produces extra sounds before stopping.
Model information
The source data for training this model is Multilingual LibriSpeech (MLS) Spanish. It was trained 278k steps at a batch size of 26 using a reduction factor r=5. The speaker encoder and vocoder are the same as the RTVC-7 release (trained on English).
File information
- RTVC_Spanish.zip contains all repo files, pretrained models and a standalone ffmpeg.exe to help load mp3 files. For the easiest setup experience on Windows, follow these instructions.
- pretrained_spanish.zip includes only the pretrained models. Copy the files to these locations relative to the root folder of the repo.
File location Filesize
--------------------------------------------------------------
encoder/saved_models/pretrained.pt 17 MB
synthesizer/saved_models/pretrained/pretrained.pt 134 MB
vocoder/saved_models/pretrained/pretrained.pt 53 MB
RTVC-7 (English)
File information
- RTVC_Windows.zip contains all repo files, pretrained models and a standalone ffmpeg.exe to help load mp3 files. For the easiest setup experience on Windows, follow these instructions.
- pretrained_models.zip includes only the pretrained models. Copy the files to these locations relative to the root folder of the repo.
File location Filesize
--------------------------------------------------------------
encoder/saved_models/pretrained.pt 17 MB
synthesizer/saved_models/pretrained/pretrained.pt 370 MB
vocoder/saved_models/pretrained/pretrained.pt 53 MB
Audio samples
https://raccoonML.github.io/bluefish_experiments/RTVC-7.html