VALL-E

An unofficial JAX implementation of VALL-E, which is a neural codec language model that can synthesize zero-shot TTS only with 3 seconds of audio.

TODO

Implement VALL-E AR and NAR models
Pytorch Implement and convert script
Train script for VALL-E
Implement EnCodec for tokenize audio
Train single speaker model(KSS)
Train Multi-speaker model(AIHub)
Add demos and pretrained models

Citations

@article{wang2023neural,
  title={Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers},
  author={Wang, Chengyi and Chen, Sanyuan and Wu, Yu and Zhang, Ziqiang and Zhou, Long and Liu, Shujie and Chen, Zhuo and Liu, Yanqing and Wang, Huaming and Li, Jinyu and others},
  journal={arXiv preprint arXiv:2301.02111},
  year={2023}
}

@article{defossez2022highfi,
  title={High Fidelity Neural Audio Compression},
  author={Défossez, Alexandre and Copet, Jade and Synnaeve, Gabriel and Adi, Yossi},
  journal={arXiv preprint arXiv:2210.13438},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
vall_e		vall_e
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
overview.jpg		overview.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VALL-E

TODO

Citations

About

Releases

Packages

Languages

License

hyunwoo3235/vall-e

Folders and files

Latest commit

History

Repository files navigation

VALL-E

TODO

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages