v0.0.14 #492
Replies: 7 comments 25 replies
-
Regarding the removal of The basic requirement is a pronunciation lexicon that will be used for the task (languages such as English and German have very big ones that also happen to be open source). It is an option that is quite stable in my personal opinion (and observation), straightforward to train and much quicker to generate pronunciations for unknown words. You can also choose to train models on different n-gram orders and call them based on word lengths. The pipeline would be as follows:
The results to get are much better than |
Beta Was this translation helpful? Give feedback.
-
I have some experience with TTS from the past when I explored espeak, MaryTTS and Festival to build a luxembourgish TTS voice. Last year I published a book in french about the history of speech synthesis. I am relatively new in the domain of machine-learning. I revisited my old ideas concerning the creation of an lb-TTS-voice a few weeks ago when Coqui-AI was launched. At the same time I discovered the rhasspy/larynx project developed by Michael Hansen (alias synesthesiam). Larynx uses gruut to transform text into phonemes. I wonder why there is no close collaboration between coqui-tts and larynx/gruut. I think both Coqui-AI and Rhasspy are outstanding and I would like to congratulate and thank the authors of both projects. Greetings from Luxembourg, Marco Barnig |
Beta Was this translation helpful? Give feedback.
-
Probably a stupid idea, but should we ask the maintainer of espeak whether he would be willing to adjust the license for espeak? |
Beta Was this translation helpful? Give feedback.
-
@kdavis-coqui Thx for pointing out the license issues, I probably will have to add a disclaimer to the phonemizer repo. Hovewer, if you are interested I could train a model on an open source dataset and make it available so you can test it out. |
Beta Was this translation helpful? Give feedback.
-
@kdavis-coqui and other contributors. Thank you very much for your detailed explanations about the licenses. |
Beta Was this translation helpful? Give feedback.
-
@kdavis-coqui Do you think lexicons derived from Wiktionary would have any licensing issues? |
Beta Was this translation helpful? Give feedback.
-
I've created a pull request to re-enable phoneme-based TTS models using gruut 🙂 |
Beta Was this translation helpful? Give feedback.
-
🐸 v0.0.14
🐞Bug Fixes
💾 Code updates
Every model now tied to a Python class that defines the configuration scheme. It provides a better interface and lets the user know better what are the default values, expected value types, and mandatory fields.
Specific model configs are defined under
TTS/tts/configs
andTTS/vocoder/configs
.TTS/config/shared_configs.py
hosts configs that are shared by all the 🐸 TTS models. Configs shared bytts
models are hosted underTTS/tts/configs/shared_configs.py
and shared byvocoder
models are underTTS/vocoder/configs/shared_config.py
.For example
TacotronConfig
followsBaseTrainingConfig -> BaseTTSConfig -> TacotronConfig
.phonemizer
support due to License conflict.This essentially deprecates the support for all the models using phonemes as input. Feel free to suggest in-place options if you are affected by this change.
TTS/recipes/
.Please check here for more details.
extract_tts_spectrograms.py
that supports GlowTTS and Tacotron1-2. (👑 @Edresson)version.py
(👑 @chmodsss)This discussion was created from the release v0.0.14.
Beta Was this translation helpful? Give feedback.
All reactions