Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hoarse pronunciation #14

Open
Shenkailai opened this issue Nov 8, 2024 · 3 comments
Open

hoarse pronunciation #14

Shenkailai opened this issue Nov 8, 2024 · 3 comments

Comments

@Shenkailai
Copy link

First of all, I would like to express my sincere gratitude to the authors. This is an excellent piece of work! I have used ConvNext_TTS, and its synthesis quality is impressive, with very fast inference speed.

I trained the model on a roughly 300-hour dataset of both Chinese and English. However, the synthesized speech occasionally has a sudden hoarseness on individual words, and increasing the number of training epochs does not seem to resolve the issue. I've trained for approximately 4M steps, but the problem persists.

image
For example, the last word of this speech segment seems to lack properly generated harmonics.
baker_004.zip

@RZJM
Copy link

RZJM commented Nov 8, 2024

您好,我用了你pr的中文前端,训练的时候出现了下面的错误,您遇到过吗?
2024-11-08 14-31-00 的屏幕截图

@Shenkailai
Copy link
Author

Shenkailai commented Nov 8, 2024

这个pr有部分代码存在问题,修改后的代码还未提交。因为目前训练出来的中文存在上述问题,考虑到也有可能是这部分中文前端的问题,所以暂时先不提交新pr了。

@RZJM
Copy link

RZJM commented Nov 8, 2024

好的,明白了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants