-
Notifications
You must be signed in to change notification settings - Fork 435
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mandrain results sharing and training support #139
Comments
In styletts1: In styletts2: |
这是用aishell3训练的吗?中文的合成韵律感觉很差啊,基本没停顿。 |
@zhouyong64 aishell is bad in general, it is just like VCTK, no emotions and flat prosodies. |
Hi liuhuang, thanks for your great sharing, and I have a question about how did you generate 48kHz audio in ref_gen.zip, because I find when I generate 48kHz audio, the audio sounds very high-pitched, but in your generation, it looks great in 48kHz, thanks for your help in advance :P |
@blldd Hi, ref_gen.zip file is generated by styletts1 model, which is a acoustic model to generate 24k mel. And then use a super-resolution hifigan vocoder convert 24k_mel to 48k wav. As styletts1 and vocoder, their mel extract params is same. |
Great! Thanks for your help! I am also curious about the multi-language capability, cause I tried the StyleTTS2 trained on LbriTTS, and I find the model cannot apply to French text, cause the generated audio is spoken in English pronunciation. |
@blldd Hi, blldd. First i retrain the asr model use Chinese phoneme. Second for no chinese pl-bert exists, i remove the pl-bert module. And then use chinese data to train styletts2_removed_pl-bert_retrain_ASR model. |
Which SLM model did you use for Chinese? I guess it's not microsoft/wavlm-base-plus. |
@zhouyong64 Hi, for now, I am still using pure English microsoft/wavlm-base-plus. Changing to another one may require some changes to the model structure, so it remains unchanged. |
hi, if you remove the pl_module, did you replace it with the text encoder on the second training stage? |
@mayfool hi, yes, i simply replace it with the text_encoder. |
Thanks for reply. Here're a few questions: 1. Did you use the text encoder pretrained from the 1st stage, or just the new text encoder without pretrain? 2. Will such modification affect the zero-shot ability? |
@mayfool hi,
|
@liuhuang31 Thanks a lot! |
@mayfool I use chinese_hubert_large model. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Share your Chinese synthesis results or mandrain model training questions.
The text was updated successfully, but these errors were encountered: