Name		Name	Last commit message	Last commit date
parent directory ..
local		local
local_fbank		local_fbank
vits		vits
vits16khz		vits16khz
README.md		README.md
prepare.sh		prepare.sh
prepare_fbank.sh		prepare_fbank.sh
shared		shared

README.md

Introduction

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded in 2016-17 by the LibriVox project and is also in the public domain.

The above information is from the LJSpeech website.

VITS

This recipe provides a VITS model trained on the LJSpeech dataset.

Pretrained model can be found here.

For tutorial and more details, please refer to the VITS documentation.

The training command is given below:

export CUDA_VISIBLE_DEVICES=0,1,2,3
./vits/train.py \
  --world-size 4 \
  --num-epochs 1000 \
  --start-epoch 1 \
  --use-fp16 1 \
  --exp-dir vits/exp \
  --max-duration 500

To inference, use:

./vits/infer.py \
  --exp-dir vits/exp \
  --epoch 1000 \
  --tokens data/tokens.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS

TTS

README.md

Introduction

VITS

Files

TTS

Directory actions

More options

Directory actions

More options

Latest commit

History

TTS

Folders and files

parent directory

README.md

Introduction

VITS