You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by Transcan January 18, 2025
Hello,
I'm currently working in a script that reads the messages NPCs send you each of them with a different voice.
I use Spanish and there isn't a lot of voices to choose from, so the messages repeats the same voices too often.
I came across this Piper project. It a project to use locally AI generated voices for TTS.
These new AI voices open up a range of possibilities to choose from.
It looks promising and I wonder if it can be used with EDDI.
Also, the quality of this voices are greater than most of the Windows native voices (The only one that is decent enough, and the one that I use for my personality, is Cortana's voice).
At the moment it doesn't create a system voice (SAPI) that can be used directly with EDDI, and I don't know if that feature will ever exists, but right now I think it can be used with EDDI with minor changes (just my guess, I'm not a professional programmer).
I apologize but I've given this a good deal of effort and I have not been successful in implementing this. Voices are complex and incorporating these requires much more than a minor change.
I have not found any simple conversion to allow these voices to be streamed in EDDI (it might be possible to generate the entire speech and then play it as a .wav file but this would be significantly slower than streaming the speech as it is generated). Further, these voices don't contain the same metadata (things like voice name, culture, etc.) and generally do not support SSML (which significantly limits our ability to influence / correct bad pronunciations).
I do think that if we were to implement this we would need to base it on voice models using the .onnx file format. We would also need to know the source of the file so that we could configure the correct inputs to generate a wave form from that model and prepare a package of metadata (like a friendly human name, culture, etc. for each voice that can be generated by each voice model). We would then need to be able to take that wave form and stream it to EDDI (so that we can begin speaking as speech is still being rendered and so that we can apply our own audio modifications to the output, e.g. radio effects).
Discussed in #2688
Originally posted by Transcan January 18, 2025
Hello,
I'm currently working in a script that reads the messages NPCs send you each of them with a different voice.
I use Spanish and there isn't a lot of voices to choose from, so the messages repeats the same voices too often.
I came across this Piper project. It a project to use locally AI generated voices for TTS.
These new AI voices open up a range of possibilities to choose from.
It looks promising and I wonder if it can be used with EDDI.
Also, the quality of this voices are greater than most of the Windows native voices (The only one that is decent enough, and the one that I use for my personality, is Cortana's voice).
At the moment it doesn't create a system voice (SAPI) that can be used directly with EDDI, and I don't know if that feature will ever exists, but right now I think it can be used with EDDI with minor changes (just my guess, I'm not a professional programmer).
I leave the link to the project here for the masters take a look:
https://github.com/rhasspy/piper
Have a nice day. o7
The text was updated successfully, but these errors were encountered: