Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] In non-English models stress could be assigned incorrectly #3282

Closed
DmitryVN opened this issue Nov 21, 2023 · 8 comments
Closed
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.

Comments

@DmitryVN
Copy link

DmitryVN commented Nov 21, 2023

Fix it plz #3039
The problem persists and because of this, normal correct use is not possible. Also at the moment it kind of breaks off the phrase at the end of each sentence and it turns out a jerky reading.

@DmitryVN DmitryVN added the feature request feature requests for making TTS better. label Nov 21, 2023
@Tessory
Copy link

Tessory commented Nov 22, 2023

@DmitryVN I see you also met that issue. Stress syllable often wrong in Russian, but Russian have a lot of word with same writting, but different spelling (different stress syllable).
Also moment with dots "." . I see TTS split sentence on separate generating parts, and pronounce strange word something like "ponto" between these sentence/parts. I only found solution in changing all dots to periods ("." -> ","), then it pronounce phrases fine.

@carinae
Copy link

carinae commented Nov 23, 2023

Same issue with dot in french.

@brambox
Copy link

brambox commented Nov 23, 2023

Well adding some kind of way to force stress will probably be biggest and most important feature. My guess is it will need alot of work and maybe break stuff.

Also problem with the gpt xtts is it sometimes decide on its own and put stress in diferent placess on multiple generations. So someways of hard force need to be exist for better accuracy.

@DmitryVN
Copy link
Author

Without this function, the meaning is lost in many places, it just turns out to be an incorrect synthesis.

@DmitryVN
Copy link
Author

Is it possible to implement some kind of solution in the future?

@Tessory
Copy link

Tessory commented Dec 18, 2023

Is it possible to implement some kind of solution in the future?

I don't know how working TTS, and hope author will give answer.
But in xVASynth/Trainer stress realise via through individual vowel phonemes.
For example, phonemes for two words may look like:

phonemes - word
Z AA1 M OW0 K - зАмок (castle)
Z AA0 M OW1 K - замОк (lock)

For AI these stressed and not stressed vowels (AA0 and AA1, or OW0 and OW1) just look like absolutely different phonemes.

Also I know Sylero have marks for stress vowels like: з+амок and зам+ок , but how it look like at phonemes level I don't know.

I think we also can add something here, somehow separate stressed vowels and retrain... or maybe fineturn will be enough?
I just want to tell, that for us - humans stressed "А" and not stressed "А" - are same vowels. But for AI it's may look like absolutelly different vowels, like AA1 and AA0 phonemes. And maybe some of these tricks can be using there too.

@DmitryVN DmitryVN mentioned this issue Dec 23, 2023
58 tasks
Copy link

stale bot commented Jan 28, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Jan 28, 2024
@stale stale bot closed this as completed Feb 4, 2024
@skalp2020
Copy link

I confirm that there is a problem, please add this function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

5 participants