LLVC: Low-Latency Low-Resource Voice Conversion

This repository contains the code necessary to train Koe AI's LLVC models and to reproduce the LLVC paper.

LLVC paper: https://koe.ai/papers/llvc.pdf

LLVC samples: https://koeai.github.io/llvc-demo/

Windows executable: https://koe.ai/recast/download/

Setup

Create a Python environment with e.g. conda: conda create -n llvc python=3.11
Activate the new environment: conda activate llvc
Install torch and torchaudio from https://pytorch.org/get-started/locally/
Install requirements with pip install -r requirements.txt
Download models with python download_models.py
eval.py has requirements that conflict with requirements.txt, so before running this file, create a seperate new Python virtual environment with python 3.9 and run pip install -r eval_requirements.txt

You should now be able to run python infer.py and convert all of the files in test_wavs with the pretrained llvc checkpoint, with the resulting files saved to converted_out.

Inference

python infer.py -p my_checkpoint.pth -c my_config.pth -f input_file -o my_out_dir will convert a single audio file or folder of audio files using the given LLVC checkpoint and save the output to the folder my_out_dir. The -s argument simulate a streaming environment for conversion. The -n argument allows the user to specify the size of input audio chunks in streaming mode, trading increased latency for better RTF.

compare_infer.py allows you to reproduce our streaming no-f0 RVC and QuickVC conversions on input audio of your choice. By default, window_ms and extra_convert_size are set to the values used for no-f0 RVC conversion. See the linked paper for the QuickVC conversion parameters.

Training

Create a folder experiments/my_run containing a config.json (see experiments/llvc/config.json for an example)
Edit the config.json to reflect the location of your dataset and desired architectural modifications
python train.py -d experiments/my_run
The run will be logged to Tensorboard in the directory experiments/my_run/logs

Dataset

Datasets are comprised of a folder containing three subfolders: dev, train and val. Each of these folders contains audio files of the form PREFIX_original.wav, which are audio clips recorded by a variety of input speakers, and PREFIX_converted.wav, which are the original audio clips converted to a single target speaker. val contains clips from the same speakers as test. dev contains clips from different speakers than test.

To recreate the dataset that we use in our paper:

Download dev-clean.tar.gz and train-clean-360.tar.gz from https://www.openslr.org/12 and unzip to llvc/LibriSpeech

python -m minimal_rvc._infer_folder \
                                    --train_set_path "LibriSpeech/train-clean-360" \
                                    --dev_set_path "LibriSpeech/dev-clean" \
                                    --out_path "f_8312_ls360" \
                                    --flatten \
                                    --model_path "llvc_models/models/rvc/f_8312_32k-325.pth" \
                                    --model_name "f_8312" \
                                    --target_sr 16000 \
                                    --f0_method "rmvpe" \
                                    --val_percent 0.02 \
                                    --random_seed 42 \
                                    --f0_up_key 12

Evaluate results

Download test-clean.tar.gz from https://www.openslr.org/12
Use infer.py to convert the test-clean folder using the checkpoint that you want to evaluate
Activate the eval environment and run eval.py on your converted audio and directory of ground-truth audio files.

Credits

Many of the modules written in minimal_rvc/ are based on the following repositories:

Citation

If you find out work relevant to your research, please cite:

@misc{sadov2023lowlatency,
      title={Low-latency Real-time Voice Conversion on CPU}, 
      author={Konstantine Sadov and Matthew Hutter and Asara Near},
      year={2023},
      eprint={2311.00873},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gradio/flagged		.gradio/flagged
experiments		experiments
minimal_quickvc		minimal_quickvc
minimal_rvc		minimal_rvc
test_wavs		test_wavs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cached_convnet.py		cached_convnet.py
compare_infer.py		compare_infer.py
dataset.py		dataset.py
discriminators.py		discriminators.py
download_models.py		download_models.py
eval.py		eval.py
eval_requirements.txt		eval_requirements.txt
hfg_disc.py		hfg_disc.py
infer.py		infer.py
mel_processing.py		mel_processing.py
model.py		model.py
requirements.txt		requirements.txt
server.py		server.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLVC: Low-Latency Low-Resource Voice Conversion

Setup

Inference

Training

Dataset

Evaluate results

Credits

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

gondar-tech/LLVC

Folders and files

Latest commit

History

Repository files navigation

LLVC: Low-Latency Low-Resource Voice Conversion

Setup

Inference

Training

Dataset

Evaluate results

Credits

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages