Does non-English TTS training work properly now? #4606
-
Hi all! I've come across this issue from 2020, which clearly shows that non-English TTS had some issues at that time. Has the state of things changed since then? I need to train Tacotron2 for Russian, but I face so many troubles in preprocessing stage. Mainly with text preprocessing. Unfortunately, I couldn't find any decent tutorials on how to train a model for a non-English dataset. Can anyone help? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
Our current tokenizers are for English and German characters, so you may need to add a new tokenizer (and possibly G2P) class for training for Russian. @aroraakshit wrote this tutorial for adding German support to TTS and may be able to answer other questions you might have about adding additional language support. |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Thanks Jocelyn for your quick answer. But there is no equivalent for the French language. I am not experienced enough to code such a tokenizer. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer.
I'm currently making a multilingual notebook in French, but following the
one for German at the same time. I think I also need a pronunciation
dictionary and a file for heteronymous words? Is that right? The dictionary
can be based on all the words in my database. However, I haven't found any
resources on the Internet for French.
Le mar. 23 juil. 2024 à 01:45, Xuesong Yang ***@***.***> a
écrit :
… we support French along with ["en-US", "de-DE", "es-ES", "it-IT", "fr-FR"].
Try FrenchCharsTokenizer
<https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py#L272>
—
Reply to this email directly, view it on GitHub
<#4606 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADERB2K6MGLA3HUX2YHN7MLZNWKQNAVCNFSM6AAAAABLIOEIUGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMJSGAYTONY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I found the 21 French heteronyms in
https://en.wiktionary.org/wiki/Category:French_heteronyms and a dictionary
:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/
if this can help someone else..
Le mer. 24 juil. 2024 à 10:39, Elie-Laurent Benaroya <
***@***.***> a écrit :
… Thank you for your answer.
I'm currently making a multilingual notebook in French, but following the
one for German at the same time. I think I also need a pronunciation
dictionary and a file for heteronymous words? Is that right? The dictionary
can be based on all the words in my database. However, I haven't found any
resources on the Internet for French.
Le mar. 23 juil. 2024 à 01:45, Xuesong Yang ***@***.***> a
écrit :
> we support French along with ["en-US", "de-DE", "es-ES", "it-IT",
> "fr-FR"]. Try FrenchCharsTokenizer
> <https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py#L272>
>
> —
> Reply to this email directly, view it on GitHub
> <#4606 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADERB2K6MGLA3HUX2YHN7MLZNWKQNAVCNFSM6AAAAABLIOEIUGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMJSGAYTONY>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Beta Was this translation helpful? Give feedback.
Our current tokenizers are for English and German characters, so you may need to add a new tokenizer (and possibly G2P) class for training for Russian. @aroraakshit wrote this tutorial for adding German support to TTS and may be able to answer other questions you might have about adding additional language support.