You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thanks for this awesome research, voice cloning desperately needs to be open sourced.
I'm interested in training a Japanese model, I have over a thousand hours of speech data.
However I'm a bit concerned about having to convert my transcriptions to IPA. Japanese has a pitch accent, with pitches possibly changing throughout a word. For example 橋、箸 are both pronounced as "hashi", but the pitch change is different for them. However when converting text to IPA, such as in this topic, this information is lost. Is there a way you can train a model with just the "raw" text? Besides from that, I just need to train/find a Japanese Bert model right? Any other things I should be aware of?
Thanks in advance
The text was updated successfully, but these errors were encountered:
* 修复多机训练问题
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* 更新并完善分布式训练功能
近期融合V2版本代码时发现之前修改的多机功能并不正确,仍会报错,只不过单机多卡情况下local_rank即相当于rank,感知不出
1. 修复train_ms.py中DDP初始化及.cuda绑定到local_rank上
2. 在default_config.yml配置文件中添加env变量 LOCAL_RANK,否则默认情况下会key error
3. 添加run_MnodesAndMgpus.sh,更新分布式相关说明
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sorry for the late reply because I am very busy recently, but for the pitch in Japanese you may refer to yl4579/StyleTTS#10 (comment). The pitch can easily be extracted from OpenJTalk return: yl4579/PL-BERT#6 (comment)
First of all, thanks for this awesome research, voice cloning desperately needs to be open sourced.
I'm interested in training a Japanese model, I have over a thousand hours of speech data.
However I'm a bit concerned about having to convert my transcriptions to IPA. Japanese has a pitch accent, with pitches possibly changing throughout a word. For example 橋、箸 are both pronounced as "hashi", but the pitch change is different for them. However when converting text to IPA, such as in this topic, this information is lost. Is there a way you can train a model with just the "raw" text? Besides from that, I just need to train/find a Japanese Bert model right? Any other things I should be aware of?
Thanks in advance
The text was updated successfully, but these errors were encountered: