-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTS Generator is broken #506
Comments
edit: added log file, i forgot |
TTS Engines & TTS Generation
So for example, with XTTS in English, sending a generation of a block of text longer than 250 characters can result in drop offs in quality, strange sounds etc. This with XTTS varies by language used too, e.g. I think the limit on Chinese is 200 characters etc. There is internal code in some TTS engines to handle text splitting in some ways for blocks of TTS longer than X, but that varies in quality and what it can do. As mentioned, I am not the manufacturer of any of the underlying TTS engines themselves. You can find links to each of the TTS engine manufacturers sites within the AllTalk interface against each engine OR you can find a current list of links here where you can research each TTS engine and the manufacturers specifications.
AnalysisI assume by "transcribe" and "Forbidden" you mean items being highlighted red in the window with the analysis option? The Analysis option is just a guide to say "hey you might want to check this was generated as TTS correctly". There is no way to 1 to 1, 100% confirm that the input text sent for TTS generation will match the output TTS. Here is an example... The input text is So when Whisper transcribes the audio of the generated TTS and the original text is compared you get: Original Text: Think I left a lotta stuff behind that day. ...Sorry, uh...allergies, And therefore a difference in the 2x texts because the original has the multiple periods and the transcribed TTS has removed those, because they are not pronounced. As such the Analysis marks the lines red, because there is a difference that may or may not matter. It may or may not sound right. It may or may not be because you have hit the token length limit of an underlying TTS engine and so the output is garbled to some degree, it may be there was additional punctuation in the original text that isnt going to appear in the generated audio TTS and therefore cannot be transcribed by Whisper. Hence, marking things red is just a guideline. There is a % accuracy setting: Which can be used to allow more things to pass by.... Take the example of the word "there" being in the original text. Whisper may transcribe that as:
Which all sound the same but have different spellings. So if the text has it spelt one way and Whisper transcribes it as another way, all I can do is say "hey the original text and the transcribed text are spelt differently, maybe this sounds wrong". Please bear in mind Whisper doesnt have an understanding of context of sentences when transcribing, so it will have no clue to say which variation of "there" is the correct one. This issue, obviously extends out into many different words and languages. So, you can set the % accuracy to be a bit more flexible by lowering the % accuracy. So at a lower % accuracy it would allow words that sound similar but may not be spelt exactly the same e.g. "there, their, they're" You can also choose different Whisper models which may or may not work better. You can find all the information about the Whisper models here https://github.com/openai/whisper Hopefully this gives you a better understanding and covers what you need to know. Thanks |
diagnostics.log
texts using the main window runs fine but in TTS Generator anything longer than 2 sentences outputs silence and screeches
the transcribe inside TTS Generator also doesnt work reliably, more than half the times for a book length everything has a forbidden sign after completion of TTS Generator
To Reproduce
put a paragraph into TTS Generator setting Chunk Sizes: any size and generate
logs are normal
Desktop (please complete the following information):
AllTalk was updated: 30/01/25
Custom Python environment: no
Text-generation-webUI was updated: using standalone
Additional context
please bring back no character limit for normal generation, it actually didnt have major problems
The text was updated successfully, but these errors were encountered: