Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coqui AI and Tortoise TTS #1106

Closed
wants to merge 38 commits into from
Closed

Coqui AI and Tortoise TTS #1106

wants to merge 38 commits into from

Conversation

da3dsoul
Copy link
Contributor

As per #885, Coqui has Pros and Cons. It's already written, so PRing it. More soon (tm). When this is accepted, I'll try to write up some more info into the wiki....and figure out how to do that.

@da3dsoul da3dsoul changed the title Coqui AI TTS Coqui AI and Tortoise TTS Apr 15, 2023
@da3dsoul
Copy link
Contributor Author

Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ

@system1system2
Copy link

system1system2 commented Apr 17, 2023

@da3dsoul I used Coqui AI for some time but I'm not super impressed with the quality so far. Maybe I didn't use it in the right way.

Tortoise TTS' quality is unmatched so far. Even by 11Labs.

I have an M2 Max with 96GB RAM and I'd be happy to test anything on this hardware as long as support for Mac is improved. (I use Tortoise TTS fast, but only in a Colab instance because on a Mac, even with my SOTA hardware, it's excruciatingly slow).

Tortoise needs more testing by not me. I don't have the hardware to run it alongside any of the models I have. As per discussion in #885, this has 3 Tortoise implementations: official, fast, and MRQ

@da3dsoul
Copy link
Contributor Author

I can't do anything about Mac support, since a lot of this stuff uses CUDA (nvidia GPU acceleration). I found the quality of Coqui varies quite a lot depending on which model you use, as it has several choices. The speed and minimum requirements, as well....

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Apr 17, 2023

Coqui is good enough.. it generates a ok-ish voice in a few seconds. Tortoise would need its own GPU.. it's not even about memory, it just takes a while.

@oobabooga oobabooga added the extensions Pull requests concerning extensions and not the core functionality of the web UI. label Apr 19, 2023
@da3dsoul
Copy link
Contributor Author

I wonder if Coqui updated and moved something

@Urammar
Copy link

Urammar commented Jun 28, 2023

Also as far as Tortoise

image

I am really having no luck at all!

@da3dsoul
Copy link
Contributor Author

Huh which model is that? It does tell you what to check. BigVGAN is missing. I've never heard of that, but that's what it says

@da3dsoul
Copy link
Contributor Author

I wonder if Coqui updated and moved something

Can confirm this is not the case. I installed fresh and it worked

@Urammar
Copy link

Urammar commented Jun 28, 2023

Huh which model is that? It does tell you what to check. BigVGAN is missing. I've never heard of that, but that's what it says

Thats just what it says out of the box trying to load the addon in the webui, I have no idea what it is referencing, but its hardcoded in the vocoder.py?

import torch
import torch.nn as nn
import torch.nn.functional as F

import json
from enum import Enum
from typing import Optional, Callable
from dataclasses import dataclass
try:
    from BigVGAN.models import BigVGAN as BVGModel
    from BigVGAN.env import AttrDict
except ImportError:
    raise ImportError(
        "BigVGAN not installed, can't use BigVGAN vocoder\n"
        "Please see the installation instructions on README."
    )

MAX_WAV_VALUE = 32768.0

@Urammar
Copy link

Urammar commented Jun 28, 2023

I downloaded pytorch_model.bin from here and threw it in the models folder, but that didnt change anything either.

@Urammar
Copy link

Urammar commented Jun 28, 2023

image

Literally all I did was click this and apply and restart interface

Also, ideally, shouldnt there be some kind of model loader box, dropdown, or something? Instead of just erroring out in the terminal? I get this is a work in progress

@da3dsoul
Copy link
Contributor Author

There should, yes, and that's why I'm confused

@Urammar
Copy link

Urammar commented Jun 30, 2023

Capture

Okay so ive done a complete reinstall of ooba, and its working much better now, it even grabbed the model by itself which is nice, and correctly installed those dependencies.

Im now getting the following error when I generate text, though, despite the extension actually loading in the ui and everything else seeming like its playing nice.

It does indeed generate text, but no audio.

@da3dsoul
Copy link
Contributor Author

I'll see if I can make that happen and let you know. Can you show what extensions you have loaded?

@Urammar
Copy link

Urammar commented Jun 30, 2023

I created a brand new ooba specifically just for coqui also, and ive run that separately (not both at once) in their own little conda environments and that says this

Capture


I'll see if I can make that happen and let you know. Can you show what extensions you have loaded?

Yeah, so for Tortoise_tts_fast is literally just that and I think I had gallery checked also.

Capture


So to reiterate, these are brand new installs of the oobabooga webui with nothing in them except a small model for testing and the relevant extension

@Urammar
Copy link

Urammar commented Jun 30, 2023

Capture

update:
After rebooting the webui again trying to launch tortoisettsfast, im now getting totally different errors.

This seems really janky man, are you trying to merge this pull request? This is absolutely not ready for prime time. Have you only tested this on your own machine? Does this literally only work in your environment?

@da3dsoul
Copy link
Contributor Author

Ok I'll take a look

@da3dsoul
Copy link
Contributor Author

This seems really janky man, are you trying to merge this pull request? This is absolutely not ready for prime time. Have you only tested this on your own machine? Does this literally only work in your environment?

As per the issue linked, quite a few people have used it with success. Is the current state still good? By your experience, the answer is no. This PR has been open for 2.5 months, and things change a lot very quickly in this space, so it might need a whole new round of testing since it was last called "ready".

@Urammar
Copy link

Urammar commented Jun 30, 2023

Yeah that makes a lot of sense

@Urammar
Copy link

Urammar commented Jun 30, 2023

Any joy? I'm at a loss now, sadly

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Jul 24, 2023

Well that error message is indicative of improperly installed tortoise. Probably due to windows and it not being in the right venv.

@da3dsoul
Copy link
Contributor Author

I updated the code based on changes to silero. Coqui fucked my conda environment, so...probably don't try that atm. Due to my conda env being fucked, I'm not testing Tortoise right now. I'll probably do a full reinstall later. @Ph0rk0z if you'd like to test, be my guest.

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Jul 25, 2023

I have been using bark and I have coqui as a pip to use in audio-webui, I use the same env for it and textgen and most other "nvidia" things. Thankfully it hasn't broken anything yet. For all of these I have been setting up the environments manually so that they don't install anything I don't want. I think users want the opposite experience where the script does it all for them :P

@oobabooga
Copy link
Owner

I think that it would be really nice if the code here could be abstracted into pipelines similar to what the multimodal extension does. The resulting structure would be something similar to:

extensions
└── tts
    ├── pipelines
    │   ├── coqui
    │   ├── elevenlabs
    │   ├── silero
    │   ├── tortoise
    │   ├── tortoise_fast
    │   └── tortoise_mrq
    └── script.py
    └── requirements.txt

The one-click-installer tries to automatically install the requirements for every built-in extension, and it would fail for tortoise. By compartmentalizing it like this, we could leave only the requirements for silero and elevenlabs in extensions/tts/requirements.txt.

Additional TTS extensions like edge_tts in #3199 and bark (external) could be adapted for this framework.

The caveat is that for TTS, additional UI elements are required by each pipeline, like the API key for elevenlabs. So the framework would have to have its own ui() sub-functions.

I'll try to do it eventually, but if @da3dsoul wants to beat me to it that would be helpful.

@da3dsoul
Copy link
Contributor Author

Maybe. I'll look at the multi-modal pipeline thing

@78Alpha
Copy link

78Alpha commented Aug 1, 2023

Thought I would test it out, but I haven't had much luck with llama 2 models and this working. Updated some dependencies and it says it makes a message, but all I see is it saying Loading autoregressive model: models/tortoise\autoregressive.pth

No error that I can see, but no output.
Updated dependencies were numba, librosa, and transformers. All to latest as of today. I turned off the unload LLM model and turned off Low VRAM to try to get it to load both, but I am still sitting at 9.1 GB / 24 GB so it seems the model never finished loading.

Added in some debug printing and it ran into Error(s) in loading state_dict for UnifiedVoice: Unexpected key(s) in state_dict:

Followed by the GPT keys.

Edit:

Appears to be a problem with transformers, 4.31.0 seems to have a regression that breaks the tortoise models and prevents loading, but it is required for llama V2 and later models. Using the tortoise part currently locks you out of the new models, but using any other tts locks you out of voice flexibility.

@da3dsoul
Copy link
Contributor Author

da3dsoul commented Aug 1, 2023

Good to know it doesn't work. I need to reinstall my whole workspace to fix things, as I'm currently stuck in dependency hell

@oobabooga
Copy link
Owner

I'm closing this in favor of #4673.

I don't want to include tortoise as XTTSv2 seems to be better overall. I don't know if the preprocessing code here applies to the new model; if so, a new PR would be welcome.

@oobabooga oobabooga closed this Nov 21, 2023
@da3dsoul
Copy link
Contributor Author

That's fair. We can maybe re-evaluate Coqui later. Considering Coqui is what messed up my build, I'm pretty eh on it.
I can probably do some work on generifying the pre-preprocessing like I did here, maybe improving it further. We'll see if I get some time to play with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extensions Pull requests concerning extensions and not the core functionality of the web UI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants