Name		Name	Last commit message	Last commit date
parent directory ..
local		local
vits		vits
README.md		README.md
prepare.sh		prepare.sh
shared		shared

README.md

Introduction

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive. The newspaper texts were taken from Herald Glasgow, with permission from Herald & Times Group. Each speaker has a different set of the newspaper texts selected based a greedy algorithm that increases the contextual and phonetic coverage. The details of the text selection algorithms are described in the following paper: C. Veaux, J. Yamagishi and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,".

The above information is from the CSTR VCTK website.

VITS

This recipe provides a VITS model trained on the VCTK dataset.

Pretrained model can be found here, note that this model was pretrained on the Edinburgh DataShare VCTK dataset.

For tutorial and more details, please refer to the VITS documentation.

The training command is given below:

export CUDA_VISIBLE_DEVICES="0,1,2,3"
./vits/train.py \
  --world-size 4 \
  --num-epochs 1000 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir vits/exp \
  --tokens data/tokens.txt
  --max-duration 350

To inference, use:

./vits/infer.py \
  --epoch 1000 \
  --exp-dir vits/exp \
  --tokens data/tokens.txt \
  --max-duration 500

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS

TTS

README.md

Introduction

VITS

Files

TTS

Directory actions

More options

Directory actions

More options

Latest commit

History

TTS

Folders and files

parent directory

README.md

Introduction

VITS