Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ability to use models other than xtts? #99

Closed
0xYc0d0ne opened this issue Feb 17, 2024 · 4 comments
Closed

ability to use models other than xtts? #99

0xYc0d0ne opened this issue Feb 17, 2024 · 4 comments

Comments

@0xYc0d0ne
Copy link

i was wondering if its possible to use another model like StyleTTS with alltalk instead of the default coqui xtts model since there are probably better models out there for voice cloning...

@erew123
Copy link
Owner

erew123 commented Feb 17, 2024

Hi @0xYc0d0ne

Not currently no. Its something Im considering, however, there will be a chunk of code to re-write to make it integrate with other models. There is no way to drop in place another model currently.

Thanks

@erew123 erew123 closed this as completed Feb 17, 2024
@UXVirtual
Copy link

@erew123 I have an experimental fork which is designed to allow use of the English VCTK/VITS model via the API Local option in the AllTalk settings interface. It runs considerably faster on lower end hardware when using CPU inference and has the benefit of multiple voices running off the single model if you need variety of English accents: main...UXVirtual:alltalk_tts:feature/vctk-vits-support

Out of the box AllTalk only supports single speaker models, but my fork allows the use of models with multiple speakers like VCTK/VITS.

I use this when testing and demonstrating portable offline TTS from my M1 MacBook which doesn't have GPU inference via DeepSpeed for XTTSv2. While the results aren't as good as XTTSv2, it is more stable and avoids various hallucinations in longer text.

While the VCTK/VITS model doesn't explicitly allow quick voice cloning, it does demonstrate using an alternate model that is compatible with the underlying TTS python library. TTS will automatically download and install the model you define in the tts_model_name property of AllTalk's config instead of XTTSv2. You can try other single voice models to see if any are suitable -

To make AllTalk use the VCTK/VITS model, you need to edit confignew.json in the AllTalk folder. Change the following property values:

  • tts_model_name to tts_models/en/vctk/vits
  • tts_method_api_local to false
  • tts_method_api_tts to true
  • tts_method_xtts_local to false

If you're on macOS you can install the espeak dependency that VCTK/VITS requires using the following brew formulae:

brew install espeak

You'll need to see what the equivalent is for Windows or Linux if you are using those OS to run AllTalk.

When making the request via AllTalk's REST API you need to add a character_speaker request attribute and set it to the voice you want (e.g. p226). See here for the full list.

@erew123
Copy link
Owner

erew123 commented Feb 19, 2024

@UXVirtual That's interesting! Ill need to have a play at some point and continue my thoughts on how this might be integrated. Ive had a few week long debate in my head about how to maybe separate the model loaders out from the rest of AllTalk, allowing the potential to load/use theoretically any model. What you've done though is a nice little addition that isn't too heavy on a re-code.

I'm going to make a note of this in the Feature requests on the discussion forum... and let me head roll over it a bit more.

Give me a bit of time and Ill get back to you at some point! (if thats ok!)

Thanks

@UXVirtual
Copy link

Hey @erew123 no problem! The separation of model loaders sounds like a good approach - I look forward to seeing what integrations can be done there :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants