AllTalk v2 Download Details & Discussion #245

erew123 · 2024-06-06T15:41:42Z

erew123
Jun 6, 2024
Maintainer

AllTalk v2 is out of BETA (November 24th 2024)

(as far as I am concerned)

Please be aware that I am STILL ongoing dealing with what I will call "an urgent family matter", unfortunately I have very unwell family members and I am supporting/caring for them. I will let peoples imagination go wherever to figure out what I mean by that. As such, I will be intermittently 100% unavailable to do anything here on GitHub/code/provide support. This situation may be ongoing for me for months to come.

Please also be clear, I am 1x person, not a company, business etc. There isn't a team here to support/deal with everything.

So, please read the built in documentation that's now inside AllTalk, Im sure it will cover 95% of questions. Also please read the extensive WIKI here for some more in-depth answers and topics not covered in the built in documentation, covering topics such as:

Known Error messages/fixes here
Debug settings in V2 here
Reverting safely back to the BETA build (if needed) here
Finetuning here
etc......

Please have fun with AllTalk.

Thanks
Erew123

💖 Sponsor this Project on Ko-fi

mercuryyy · 2024-06-07T05:13:46Z

mercuryyy
Jun 7, 2024

Installing and Testing now, Amazing job. Will post results / comments soon

7 replies

erew123 Jun 7, 2024
Maintainer Author

RE "[AllTalk ENG] Warning: Model 'tts_models--en--jenny--jenny' does not match any known model type."

That's just trying to load a default model in and none exists hence the warning. But noted. I did tidy up the tts_engines.json but it didn't upload right so, I just uploaded another copy. Will tidy that up at some point.

When a model has been downloaded, you should be able to use the Refresh Server Settings button to get it to update the dropdowns. Worst case Swap TTS Engine is the "Have you tried turning it off and on again" of AllTalk, so should always get back to square one.

Where can we download some default RVC voices just for testing?

In the documentation there's a link to https://voice-models.com/ 74,000 voices (so it claims).

erew123 Jun 7, 2024
Maintainer Author

Quick note. RVC, Hop length should probably be 130. Index influence 0.75.

mercuryyy Jun 7, 2024

So for vits i did all that, i manually check and tts_models--en--jenny--jenny is in the models dir but still giving the error.

Yeah i saw https://voice-models.com/. Thank you!

erew123 Jun 10, 2024
Maintainer Author

Hi @mercuryyy With the VITS tts engine loaded, you should have any VITS models listed in the "Load Different Model" dropdown. Select a VITS model in there and click the button and see if that resolves it for you.

xdax1 Sep 7, 2024

Hello, how can I use these voices from this voice-models site in alltalk?

Jxspa · 2024-06-07T10:01:45Z

Jxspa
Jun 7, 2024

All good here (windows 10). Thank you for your hard work. It looks great! Love being able to easily switch between finetuned xtts models.

0 replies

bollerdominik · 2024-06-08T07:48:23Z

bollerdominik
Jun 8, 2024

Thanks for your work. Testing it on RunPod Ubuntu.

Installation worked fine but running it I get

The "Running in Docker" is strange as I don't have docker installed.

After manually editing the script.py I got it to work. Unfortunately I can't get DeepSpeed to work. It says it is installed
DeepSpeed version : 0.14.2+cu121torch2.2 but all requests are with DeepSpeed: False and it is not clear how to enable it.

Can't get Gradio UI to work since RunPod creates a Cloudflare Tunnel and afaik there is no way to specify a a custom API / Gradio domain during the AllTalk setup.

I still hope a future version of AllTalk can nicely integrate with running the application in the cloud (Runpod, Collab, etc)

3 replies

erew123 Jun 10, 2024
Maintainer Author

Hi @bollerdominik Colab and docker/runpods all need a few minor changes, though Ill be working on Colab first. This Im hoping will be easy, but could be another 10-20+ hours of testing various things so I didnt want to delay getting the BETA out at this time. Hopefully will have an update on this soon though.

Re DeepSpeed. that is set on a per-engine basis (where engines support it).

You need to enable it there and then reload the TTS engine or model.

Thanks

bollerdominik Jun 10, 2024

I can't use Gradio UI because of the issue mentioned above. Is there any manual way to enable DeepSpeed. I tried updating confignew.json but it still shows

  "deepspeed_capable": true,
  "deepspeed_available": true,
  "deepspeed_enabled": false,

in the API.

Edit: Managed to enabled it by reading the code and calling the /api/deepspeed?new_deepspeed_value=true. Now deepspeed works

erew123 Jun 10, 2024
Maintainer Author

@bollerdominik Ive updated the code today, so that if you can get a 2nd tunnel working to the server, you will have access to the gradio interface. The tunnel will need to pass to port 7852. Gradio will be accessible, but you will not be able to generate TTS in Gradio (yet) though it will give you control into the models.

That aside, currently XTTS is the only tts engine in there that supports DeepSpeed. You can edit the XTTS TTS engine setting (that you would change in the interface) by going to /system/tts_engines/xtts/ and editing model_settings.json to change "deepspeed_enabled": false, to "deepspeed_enabled": true,

Suiyou · 2024-06-09T04:13:15Z

Suiyou
Jun 9, 2024

Great work, it installed without issues, I was able to download mostly everything through the interface except new RVC voices, change between tts engines/models, use RVC voices, etc. XTTS is still the best free option in my opinion for speed with decent emotive voices and coupled with RVC you can make it sound even better. I was already using another project to do that for my TTS generator output folder files before joining them.

If you want, you can take a look at a github project called Applio that has an integrated search and download of RVC voices and I don't remember what TTS engine it uses but its the fastest I tried, not as emotive and without RVC it doesn't sound quite so good, the only downside it's that it has a hard limit to the amount of lines you can generate per generation.

I primarily use TTS generator but its nice to setup everything swap to TTS gen and keep working with it. Speaking of TTS gen, it would be convenient if RVC, joining and transcoding gets performed at the end after I finish checking the lines I have to regenerate to avoid the overhead time (I know I could disable RVC but then I have to use another project like I've been doing so far to process them) .

Also TTS Gen has a bug that after it finishes generating every chunk it won't enable the options to export, play, clear, etc, you have to regenerate any chunk to get them enabled.

I think there is bug when using the tab for Voice2RVC, it won't show me the models I got.

Lastly I tried the option to analyze the accuracy of the generated audio, great addition but to me is impractically slow on my machine, I use another whisper project to transcribe the audio and a python script to compare the lines in the json file generated by tts gen, 1000 files at a time, it takes me like 15 minutes.

I need to check all the the other new functionality added in this beta but the project keeps getting better every time I check.

7 replies

erew123 Jun 12, 2024
Maintainer Author

@Suiyou Yes it may have issues with Symlinks... really not sure on Pythons handling of that. I do know that Firefox browsers have some quirks playing back audio, so I would assume any browsers that are built off Firefox may have quirks too. A ticket I looked into streaming issues is here #143, though I haven't explored further to see if there are other quirks, there may be, I just dont have time at the moment to test across all browsers.

Re "teach the model how to pronounce certain words," Im assuming you mean XTTS models. In theory yes, but how much Finetuning it would need I dont know. You can teach them other/new languages with enough training, so in theory any new word should be possible. You would have to research on Coqui's site on building a training set for such a thing.

Ford4D Nov 27, 2024

@Suiyou @erew123 Hi, all. Sorry to bother you guys. I have tried over and over for days to get Voice2RVC to work (using pre-recorded audio files), but nothing ever works. First of all, I get that same problem where even after enabling RVC in the Global Settings it doesn't show me a list of models. Says it downloaded them, but they never show up, even after multiple Windows restarts. (All I ever see are the models I'm generating, see next paragraph).

Second, I'm able to generate (what I believe to be) an RVC from my carefully edited 10 min dataset using "start_finetune.bat", but when I drag my trained RVC folder into "/models/rvc_voices/", all the "Select RVC Voice to generate as" menu gives me to choose from are the .pth files in that folder I just copied over. They are always:
dvae.pth
mel_stats.pth
model.pth
speakers_xtts.pth

Selecting any of them causes the following respective errors, in the same order as above:
Error during Voice2RVC conversion: 'config'
Error during Voice2RVC conversion: too many indices for tensor of dimension 1
Error during Voice2RVC conversion: -1
Error during Voice2RVC conversion: 'config'

What am I doing wrong? Is there a bug that @Suiyou also experienced a symptom of, or is it just me?

And why doesn't "start_finetune.bat" create these index.json files that I keep reading about in your documentation on RVC?

Forgive me for any typos, I have barely slept these past few days.

erew123 Nov 27, 2024
Maintainer Author

@Ford4D The Finetuning is for the Coqui XTTS model and not RVC

https://github.com/erew123/alltalk_tts/wiki/XTTS-Model-Finetuning-Guide-(Simple-Version)

There is currently no RVC voice model creation in AllTalk, it is on the to-do list. Please see Feature Requests here

Currently you can use any pre-existing RVC models with the AllTalk RVC implementation https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion)

From places such as here https://voice-models.com/ which has 100,000+ pre-created voice models.

I hope that clarifies

Thanks

Ford4D Nov 27, 2024

I'm so sorry to have bothered you, @erew123 !! I feel silly, honestly. I will try to research the most user friendly tool to create RVC voice models. (If you have any suggestions, I'm all ears).

Thank you so much 🙏

erew123 Nov 27, 2024
Maintainer Author

@Ford4D Its something Im working on. Might be a few days, might be a week, could even be longer. All depends on how much time I can find to code/test etc. But it is happening

Dagbafrosty · 2024-06-09T20:24:32Z

Dagbafrosty
Jun 9, 2024

Hey, thanks for the beta! One issue I noticed is not being able to access the gradio page from other devices in the network. The api page and TTS generator page is accessible using 192.168... but not gradio, not sure if this is an issue on my end. Everything is accessible from the host computer on 127.0 etc

1 reply

erew123 Jun 10, 2024
Maintainer Author

Hi @Dagbafrosty I know what this will be. It will come back to a change I make for the Colab setups etc. Its not your computer so dont worry. I will hopefully have an update soon for this. Will post back on here.

Thanks

m-eideh · 2024-06-10T11:23:02Z

m-eideh
Jun 10, 2024

Hi, thank you for the great work.

I was trying to create a dataset for Arabic language, however, I'm getting the following error:

If I switch the language to English (but still using an Arabic audio files), it works fine, generates wavs correctly, and actually translate the sentences in metadata_train & metadata_eval to English.

So it's understanding the language fine, but there seems to be an issue with file/sentence generation in native Arabic.

6 replies

m-eideh Jun 11, 2024

Thank you for the prompt reply and fix @erew123!

It is now indeed generating wavs and transcribing Arabic correctly! However, there are a couple of major issues with the wav files generation:

Even-sequenced files are generating to "finetune\tmp-trn\wavs" (eg: xx_00000002.wav, xx_00000008.wav)
Odd-sequenced files are generating to "finetune\tmp-trn\wavs\wavs" (eg: xx_00000009.wav)
The file names don't match the file generated in the metadata csv files. For example, in metadata_train.csv here:

The first two sentences show that they are "wavs/731_00000002.wav" and "wavs/731_00000004.wav". But when listening to the wav files, the actual sentences should be in "wavs/wav/731_00000001.wav" and "wavs/wavs/731_00000003.wav" respectively.

This is probably related to the first two issues. I'll gladly test more scenarios if needed.

m-eideh Jun 11, 2024

One more thing to note is that all files "finetune\tmp-trn\wavs" are less than 1 second long and are basically unusable, while the ones in "finetune\tmp-trn\wavs\wavs" are the actual usable 15-second files (length which I specified in the settings).

erew123 Jun 12, 2024
Maintainer Author

Hi @m-eideh

I've just (after a very long slog coding) updated the finetuning. You will need to update the requirements to use it, start the python environment and pip install word2number then you should be good to run finetuning. The paths and csv's will(should) match up perfectly fine now.

I have added a dataset validation option (you will see the tab). This is a best effort to run Whisper against the originally generated wav samples and compare to see if the wav files match what's inside your csv files. I have no idea how well Whisper will work on other languages either for dataset generation or for this validation. But it may help you out there.

Re "are less than 1 second long and are basically unusable" and "length which I specified in the settings". I've set the code to make wav files a minimum of 1 second long, which should be fine for most cases, its ok to pick up only 1 or 2 words and train on that. The setting for 15 seconds that you set, that is a maximum wav file length, not minimum. The issue here is that the trainer can only handle training a of 12 seconds of audio at a time (or something close to that), so Whisper used to sometimes create wav files that were 2 minutes long and obviously, that 2 minute file would only get processed once per epoch with only 12 seconds of audio from it being used, meaning that a lot of the audio was never trained on with overly large wav files. As such, we split down those larger wav files into smaller files, so that individually each file will have a chance of its audio being used, rather than never used. Hopefully that makes sense. Thanks

m-eideh Jun 13, 2024

After some testing, the issues in my post are pretty much resolved, thank you @erew123!

One last thing I have noticed (maybe this is related to Whisper), when these 1 second wavs are generated, the sentences in the metadata files are duplicated, but in the second occurrence the wav file doesn't exist.

For example, these 2 lines in metadata_train are identical:

However, upon checking the wavs folder, "wavs/731_00000007.wav" exist and is only 1 second long, and contains only a few words of the full sentence printed in the metadata file, while "wavs/731_00000008.wav" doesn't exist at all.

I saw something similar with 2 and 3 second files. Out of 115 wav files generated from 4 audio sources, 8 files were affected while the rest were correctly generated at 15 seconds long with the correct sentences.

Not a huge issue as I can just manually remove these from the metadata files for now, but something to note.

erew123 Jun 13, 2024
Maintainer Author

@m-eideh Yeah I thought I spotted that possibly happening, but I couldn't manage to reproduce it with the datasets I tested. So Ive just faked up a couple of missing files in my dataset and Ive added a console/terminal printout when you run the dataset validation. It will at least tell you what missing files there are:

So that might help get you 80% of the way there and Ill re-visit the code some time in future.

gshawn3 · 2024-06-11T13:13:09Z

gshawn3
Jun 11, 2024

I ran into a couple of issues running the beta as an Oobabooga plugin. The first is that on first startup, it looked for firstrun.py in the wrong path.

05:43:44-831231 INFO     Loading the extension "alltalk_tts"
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
python: can't open file 'C:\\Users\\Admin\\Desktop\\text_generation_webui\\system\\config\\firstrun.py': [Errno 2] No such file or directory

Error occurred while running the script: Command '['python', 'system/config/firstrun.py']' returned non-zero exit status 2.

In my specific case, it should have looked for that file in C:\\Users\\Admin\\Desktop\\text_generation_webui\\extensions\\alltalk_tts\\system\\config\\firstrun.py

Editing the script_path variable with the correct path fixed the issue on the next startup.

The second issue is tricker and I haven't been able to figure it out. Despite seemingly having all the requirements installed correctly, the app complains that there is a missing Gradio "system" module:

05:53:32-686542 INFO     Starting Text generation web UI
05:53:32-689532 INFO     Loading the extension "alltalk_tts"
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Text-gen-webui mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 6th June 2024 at 22:23
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : Not available
[AllTalk ENG] Python Version    : 3.11.9
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 11.66 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
05:53:55-750966 ERROR    Could not import the requirements for 'alltalk_tts'. Make sure to install the requirements for the extension.

                         * To install requirements for all available extensions, launch the
                           update_wizard script for your OS and choose the B option.

                         * To install the requirements for this extension alone, launch the
                           cmd script for your OS and paste the following command in the
                           terminal window that appears:

                         Linux / Mac:

                         pip install -r extensions/alltalk_tts/requirements.txt --upgrade

                         Windows:

                         pip install -r extensions\alltalk_tts\requirements.txt --upgrade

05:53:55-753956 ERROR    Failed to load the extension "alltalk_tts".
Traceback (most recent call last):
  File "C:\Users\Admin\Desktop\text_generation_webui\modules\extensions.py", line 37, in load_extensions
    extension = importlib.import_module(f"extensions.{name}.script")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "C:\Users\Admin\Desktop\text_generation_webui\extensions\alltalk_tts\script.py", line 1141, in <module>
    import system.gradio_pages.themes.loadThemes as loadThemes
ModuleNotFoundError: No module named 'system'

Running on local URL:  http://127.0.0.1:7860

Note that I've tried installing the requirements first through atsetup.bat, and subsequently via pip install -r system\requirements\requirements_textgen.txt, but either way the error persists. (Side note that the instructions to install requirements still need to be updated in the error message above.)

Here are the relevant sections from running the diagnostics:

CUDA Device : NVIDIA GeForce RTX 3090 Ti
CUDA Memory : 23.99 GB
CUDA Version: 12.1
CUDA Working: Success - CUDA is available and working.
CUDA_HOME   : C:\Users\Admin\Desktop\text_generation_webui\installer_files\env
Cublas64_11 : C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\nvidia/cublas\bin\cublas64_11.dll

  If you do not have a CUDA version and CUDA is failing, you will not have your
  TTS engines being accelerated with CUDA. CUDA is only available on Nvidia GPU
  and is setup by installing PyTorch with a correct CUDA version in your Python
  virtual environment.

PyTorch Version  : 2.2.1+cu121
Python Version   : 3.11.9
Python Executable: C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\python.exe

  AllTalk has been validated to run on Python 3.11.x versions and also PyTorch
  2.0.x to 2.2.x. Earlier or later versions of PyTorch and Python may not work.

Conda Environment: C:\Users\Admin\Desktop\text_generation_webui\installer_files\env

Python Search Path:
  C:\Users\Admin\Desktop\text_generation_webui\extensions\alltalk_tts
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\python311.zip
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\DLLs
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\win32
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\win32\lib
  C:\Users\Admin\Desktop\text_generation_webui\installer_files\env\Lib\site-packages\Pythonwin

  If you are correctly in the AllTalk Python virtual environment, you will
  expect to see 'alltalk_environment' as part of the path of the above folders.
  If you are running AllTalk as part of Text-generation-webui, you should see
  'text-generation-webui' listed in the path of the above folders. If you dont
  see them mentioned, you have probably not started the correct Python virtual
  environment.

Requirements file package comparison:
  coqui-tts           Required: >= 0.24.1        Installed: 0.24.1
  faster-whisper      Required: >= 1.0.*         Installed: 1.0.2
  gradio              Required: >= 4.26.0        Installed: 4.26.0
  importlib_metadata  Required: >= 7.1.*         Installed: 7.1.0
  inputimeout         Required: >= 1.0.4         Installed: 1.0.4
  Jinja2              Required: >= 3.1.*         Installed: 3.1.2
  librosa             Required: >= 0.10.2.post1  Installed: 0.10.2.post1
  nvidia-cublas-cu11  Required: >= 11.11.3.6     Installed: 11.11.3.6
  nvidia-cudnn-cu11   Required: >= 9.1.1.17      Installed: 9.1.1.17
  onnxruntime-gpu     Required: >= 1.18.*        Installed: 1.18.0
  pydantic            Required: >= 2.7.*         Installed: 2.7.3
  python-ffmpeg       Required: >= 2.0.*         Installed: 2.0.12
  python-Levenshtein  Required: >= 0.25.1        Installed: 0.25.1
  praat-parselmouth   Required: >= 0.4.*         Installed: 0.4.3
  pyworld             Required: >= 0.3.*         Installed: 0.3.4
  sounddevice         Required: >= 0.4.7         Installed: 0.4.7
  soundfile           Required: >= 0.12.*        Installed: 0.12.1
  spacy               Required: >= 3.7.1         Installed: 3.7.5
  torchcrepe          Required: >= 0.0.2         Installed: 0.0.22
  tqdm                Required: >= 4.66.*        Installed: 4.66.4
  unidic-lite         Required: >= 1.0.8         Installed: 1.0.8
  uvicorn             Required: >= 0.29.0        Installed: 0.30.1

(Another side note, the message above mentions "you should see 'text-generation-webui' listed in the path of the above folders." That is no longer correct, because dashes in the folder name now cause AllTalk to throw an error on startup. Could be confusing to some users.)

Let me know if you'd like me to try anything specific to troubleshoot.

3 replies

erew123 Jun 12, 2024
Maintainer Author

@gshawn3 I know exactly what this is the second I saw it and I realised what I have done (or not done). I had to write a chunk of extra code for identifying when AllTalk is running in TGWUI, which I did... and then in my rush dealing with other code issues, Ive not merged it into the main script. So ill take a shot at getting this merged back in today.

gshawn3 Jun 12, 2024

Thanks for being super responsive! I can wait, there's no rush on my end... The V1 of AllTalk still works great 🙂

erew123 Jun 12, 2024
Maintainer Author

Should be done. You can probably just git pull and it should be ok, but feel free to re-apply the requirements to be sure.

Mithadon · 2024-06-12T01:32:45Z

Mithadon
Jun 12, 2024

From my experience with v1, and looking at the screenshot of v2, this is going to be phenomenal. So glad you're doing all of this. Thank you!

0 replies

gboross · 2024-06-12T08:09:41Z

gboross
Jun 12, 2024

Hello, the second variation is really great. By the way, is it possible for you to make it so that it can serve multiple clients simultaneously rather than sequentially as the requests come in? So, can it be asynchronous? Of course, if there are enough resources, but could it be done even through Docker? Thanks.

1 reply

erew123 Jun 12, 2024
Maintainer Author

Hi @gboross Please see the Feature Requests list and the links in there on Streaming to see where that it at link here

StellarBeing25 · 2024-06-12T09:07:01Z

StellarBeing25
Jun 12, 2024

Hey, V2 is great. Here are some suggestions to further streamline the user interface. The contents of the AllTalk v2 Beta, Generate Help, API Endpoints & Dev, and About This Project tabs should all be moved under the Documentation and Help section. TTS-generation settings can also be shifted under Global Settings. Please consider.

2 replies

erew123 Jun 12, 2024
Maintainer Author

Hi @StellarBeing25 My ultimate goal was potentially to make these modular, so you can in effect turn off certain pages/things in the interface. I've not had chance to do that yet though. That aside, I would intend to put documentation in the documentation section. The issue is people finding it/reading it. I often find I spend quite a lot of time pointing people to the documentation, so while its in BETA, I have left things quite prominent in the interface, in a hope it will ease my burden and also people can tell me problems they had with explanation in the documentation.

With the TTS-generation setting, these are specifically unique to each individual TTS generation. They are not stored settings, so having them elsewhere in the interface does not make sense for people whom may be developing/want to test out certain things. On the flip side of that, I may be able to make the extra settings (for want of a better term) an accordion:

Where you have an expandable section to get to these other features, which would probably cover off most of what you're suggesting?

StellarBeing25 Jun 12, 2024

Forgot to add: It would be nice to have RVC pitch adjustment also available under Generate TTS and Voice2RVC since it frequently needs to be adjusted depending upon the TTS voice selected.

jeddyhhh · 2024-06-14T01:51:05Z

jeddyhhh
Jun 14, 2024

Hey, great work on the project, I've been using v1 for a few days now and have started moving towards v2 with my project totally-real-news-bot so I can use RVC models.

I'm using alltalkv2 in TGWUI mode

I'm using the API with piper TTS, that works great, generation is muiltiple times faster than coqui (my pc build is chinese e-waste), but when I use RVC, it seems to start the conversion process, it finds a .pth model, VRAM usage goes up but then my CPU shoots to 100% like its processing something.

Is it possible RVC conversion is in cpu mode or could my setup be incorrectly configured?

2 replies

erew123 Jun 14, 2024
Maintainer Author

Hi @jeddyhhh Im actually not sure if a CPU mode for RVC will/wont work at the moment. I stripped apart and rebuilt a decent amount of RVC to get it working on Python 3.11, though I never looked at CPU specifically (not enough time on my hands to check every variation in time for a BETA, lots of other code to deal with etc). What I can say is that RVC is a 2x step process, dealing with the index file and then dealing with the model. Im 95% sure that the index file stage would work ok, however, if you want to test that, you can move the index file out of the folder and still run RVC and see if that changes anything for you. My code will (should) say "hey, no index to process, so ill just get on generating TTS". So you can eliminate one step and see if that has any change. If that worked, then its an indexing issue, if it doesnt work then that would suggest that it doesnt work on CPU. Though saying that, RVC is quite heavy processing so I cant say how long it would take on a CPU. Id suggest trying with a smaller TTS sample first.

jeddyhhh Jun 15, 2024

Hey, thanks for the reply, I think something is weird with the TGWUI mode (or my TGWUI is configured incorrectly).
I've just installed alltalkv2 as a standalone app and RVC conversion works using the GPU using the same settings as I used in TWGUI mode.

I'm pretty sure its trying to do the RVC conversion with the CPU in TWGUI mode, which I didn't know was possible, it takes way too long but doesn't crash, no errors. Just takes hours to convert 1 minute of audio.

I'll just use alltalkv2 as a standalone app for now, it all works as expected. Thanks :)

ibrah3m · 2024-06-19T10:24:22Z

ibrah3m
Jun 19, 2024

I tested the project, it's amazing!

works perfectly with English.
in Arabic I still struggling a little bit need more work , I finetune with 100 Epoch but didn't make notable change (Xtts) I saw there's different tts options but never tried them

0 replies

Mithadon · 2024-06-19T18:26:23Z

Mithadon
Jun 19, 2024

I've been using v2 for a while now and it's fantastic. I use the standalone version, usually with SillyTavern. I did have some difficulty getting the SillyTavern settings to work. Something about having so many voice/narrator dropdowns and having to match them with selections in the webui, it's confusing to me. It would be great to have some way to save some presets - for example, when selecting preset A, it populates alltalk character, narrator, and rvc character, narrator.

Can't wait for the large generator to be added to the main webui. That is my main wish, together with being able to import .txt files into it or, better yet, process an entire folder of .txt files. Wow!! Having RVC applied at the same time is so much less hassle than exporting .wav, then running it through RVC webui manually...

Last thing: are you aware of any XTTS2 finetunes for accents or gender (in English)? I've googled a lot and been to websites that claim to host tons of models but have found only a handful of XTTS2 finetunes, and not a single interesting one.

Thx!

1 reply

erew123 Jun 20, 2024
Maintainer Author

Hi @Mithadon The voice settings within ST are stored within SillyTaverns own "voicemap" save file. I did originally save things separately but got a polite nudge (friendly telling off) by the ST Devs and told to leave ST to store things in their voice map. In theory (as I understand at least) this should store your main character voice setup on a per character card basis, though it wont save the rvc voices etc separately as part of the voice map, I think they are a more global saved setting. So I dont think its something I can change easily as it doesn't align with how they want the ST code to work.....

When you say "large generator" you mean the TTS generator? Its still there as the web page version, the link is on the TAB and it will pull the default AllTalk settings you centrally can set, meaning, if you set an RVC voice as the default voice, it will generate the TTS with RVC voices, you just wont be able to select them on the web page, only in the gradio interface central/global settings. Updating the TTS Generator code to Gradio is, well, challenging lets say, mainly because of some limitations/complexities that Gradio introduces. Ive had 2x shots at it and cant get the list generation to work correctly. All the other bits do, but generating dynamic lists of text/TTS that you can edit, they are a problem, so Im considering if that can or cant be achieved. TBD.

Re finetunes, Im not aware of them generally being around on the internet. Im not too sure anything has specifically been setup to share them. Though if you want to put a post up in the Discussion area on here asking if people want to share, Ive no issue with that... I guess there is a question of where they get put to be shared... but youre welcome to put up a post.

Dolyfin · 2024-06-21T04:35:39Z

Dolyfin
Jun 21, 2024

Would you be looking to add MeloTTS to v2 at some point? Seems like one of the better (and faster) TTS models that you can also train locally.

1 reply

erew123 Jun 24, 2024
Maintainer Author

Technically any TTS engine can be added in, bar conflicting requirements with other TTS engines. Obviously one TTS engine may say it wants Transformers v4.37 and another may demand Transformers v4.38 or later, which can them impact other requirements/versions.

What Im getting at, is, yes its entirely possible to use that TTS engine, but maybe not at the same time as SOME other TTS engines.

I will add other TTS engines at some point in time, though I have left a template folder within the BETA. In there is a very rough instructions on adding a new TTS engine https://github.com/erew123/alltalk_tts/tree/alltalkbeta/system/tts_engines/template-tts-engine

And within each template script, I have put instructions throughout https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/tts_engines/template-tts-engine/model_engine.py if someone wants to try adding in a new TTS engine and has a bit of coding knowledge.

ghost · 2024-06-22T13:45:00Z

ghost
Jun 22, 2024

Can't get it work :(

[AllTalk TTS] _ _ _ _____ _ _ _____ _____ ____
[AllTalk TTS] / \ | | |_ | | | | __ | | / |
[AllTalk TTS] / _ \ | | | | |/ _ | | |/ / | | | | _
[AllTalk TTS] / ___ | | | | | (| | | < | | | | ) |
[AllTalk TTS] // __|| ||_,|||_\ || || |___/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
Traceback (most recent call last):
File "C:\Users\USER\Desktop\alltalkbeta\script.py", line 190, in
import gradio as gr
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_init_.py", line 3, in
import gradio.simple_templates
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_simple_templates_init.py", line 1, in
from .simpledropdown import SimpleDropdown
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio_simple_templates\simpledropdown.py", line 6, in
from gradio.components.base import FormComponent
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio\components_init_.py", line 1, in
from gradio.components.annotated_image import AnnotatedImage
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\gradio\components\annotated_image.py", line 9, in
import PIL.Image
File "C:\Users\USER\Desktop\alltalkbeta\alltalk_environment\env\Lib\site-packages\PIL\Image.py", line 100, in
from . import _imaging as core
ImportError: DLL load failed while importing _imaging: Das angegebene Modul wurde nicht gefunden.

2 replies

erew123 Jun 22, 2024
Maintainer Author

Try start_environment.bat then pip install --upgrade --force-reinstall pillow Not sure what may be breaking that atm, but that should fix it.

ghost Jun 22, 2024

Thanks, that fixed it!

Jidis83 · 2024-10-07T02:38:33Z

Jidis83
Oct 7, 2024

@erew123 - Do you have any ideas on that deal I might have mentioned where AllTalk continues generating the output file indefinitely? The line of text I fed it to generate will typically be the last thing in the console, with no total time given afterward and no auto-playback of the newly generated file(s), but it will make them back to back, with the GPU hanging at 98% or something (which is normal when it's making one). It will actually start back making them even if I change models or refresh and I don't believe it will accept new text. The only way out is to CTRL-C and restart AllTalk. It seems that using that 7852 interface is where it occurs. I used only the 7851 last night and kept it from happening. - BTW- This is all XTTS engine stuff

I'd put it in with the issues, but if you or anyone else hasn't hit it, I'll just figure it's one of the many joys of running on a 2GB GT 1030 card. ;-)

3 replies

erew123 Oct 7, 2024
Maintainer Author

Hi @Jidis83 I will say with reasonable certainty this is being caused by a low memory situation on your VRAM, memory fragmentation of that VRAM and in this scenario, processing time can go from potentially seconds, up to X minutes for the same generation.

Imagine your desk represents VRAM, and you can neatly arrange 20 sheets of paperwork (representing the XTTS model) on it. Everything fits perfectly. Now, suppose you need to work on 3 additional sheets. Since there’s no space left on your desk, you decide to store 3 of the original sheets in a filing cabinet (System RAM or Disk if there isnt enough System RAM) to make room.

While this gives you space to work, accessing the sheets in the cabinet takes time. When you need to refer back to those papers, you have to swap them in and out of your desk. This constant shuffling can become tedious, especially if you find you need to access multiple sheets at once. It also results in the paperwork becoming out of sequence, so no longer in order 1,2,3,4,.... etc, causing futher complications.

As you continue to add more paperwork (doing more TTS requests), your desk and filing cabinet become cluttered. This leads to memory fragmentation: the paperwork is spread out and not neatly organized, making it difficult to manage and slowing down your work.

In this scenario, your 2GB of VRAM is a limited resource. The XTTS model requires 1.8GB to load, leaving only 200MB for displaying screen content and processing new TTS requests. As you keep adding more tasks, the fragmentation issue worsens, impacting overall performance. This performance drop on a low VRAM scenario can 10x (or possibly more) a TTS generation request and obviously the XTTS model is on the edge of how much VRAM your GPU has.

So, the compounding factors are:

the more graphical things you have loaded on screen, the more VRAM memory they will be using e.g. lets say you load up a web page that has lots of photographs on it. Those images are being stored in your VRAM and take up space, lowering the available space to store the XTTS model OR have working space for TTS generation requests.
Fragmentation occuring either immediately OR worseneing over time as multiple TTS requests occcur OR potentially larger/longer tts requests causing more bits to be shuffeld around between VRAM, System Ram and Disk

Jidis83 Oct 7, 2024

Thanks for the details @erew123 !

Yeah, I figured it was mainly from the weak GPU. Would it be possible to get better handling of these sort of situations from the AllTalk interface, or is it something that gets irreparably screwed up in the software AllTalk is accessing? Like I said, I'm not really able to gracefully get out of this and a couple other error situations I've hit through the methods I'd figure would work (like the stop generating button). The full "force quit" is often the only way to clear it. I'm also wondering, if it's not too difficult to explain, how much difference there is in accessing the TTS functions through that 7851 web GUI vs. the 7852, since the latter seems to only be where it blows up like that.

Thanks Again!

erew123 Oct 8, 2024
Maintainer Author

@Jidis83 Unfortunaately memory mangement of that level is handled by the inner workings of Python and Nvidia CUDA and isnt directly acesseable to that degree.

The 7852 interface offeres much more control over the configuration settings, but as far as TTS generataion, no real difference, it makes the same API requests/calls.

joshgura · 2024-10-08T14:23:00Z

joshgura
Oct 8, 2024

i'd like to know the way to change the currently set tts engine outside of the webui, because i'm getting a persistent crash situation caused by parler. once parler is set in the webui, alltalk beta will crash, and upon re-starting, alltalk still thinks it's supposed to run parler due to some persistent data. where is that config file so i can just figure out a way to edit the currently set engine. it seems silly to have to reinstall just because some config file is out there. unless it's written to assembly language and not plain text. i looked in the obvious file called config in the main directory, but it doesn't look like there is any line in that document to save that setting.

i asked this same question before but i can't find my question so i am unable to respond if there was a thread, sorry.

i followed the force reinstall thing and that was a start, just rather have the config file name so i can dig it up.

here's the traceback stuff for when i try to start alltalk beta on linux after trying to change to parler in the webui, it crashing, and subsequently preventing the launch of alltalk beta.

Traceback (most recent call last): File "/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/tts_server.py", line 171, in <module> loader_module = importlib.import_module(f"system.tts_engines.{engine_loaded}.model_engine") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/env/lib/python3.11/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1206, in _gcd_import File "<frozen importlib._bootstrap>", line 1178, in _find_and_load File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 690, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 940, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/system/tts_engines/parler/model_engine.py", line 35, in <module> from parler_tts import ParlerTTSForConditionalGeneration File "/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/env/lib/python3.11/site-packages/parler_tts/__init__.py", line 8, in <module> from .modeling_parler_tts import ( File "/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/env/lib/python3.11/site-packages/parler_tts/modeling_parler_tts.py", line 29, in <module> from transformers.cache_utils import ( ImportError: cannot import name 'EncoderDecoderCache' from 'transformers.cache_utils' (/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/env/lib/python3.11/site-packages/transformers/cache_utils.py) [AllTalk TTS] Warning TTS Engine has NOT started up yet. Will keep trying for 240 seconds maximum. Please wait.

so, long story short, changing engines to parler breaks alltalk beta, unless there's some human readable config file to change the currently set engine.

4 replies

erew123 Oct 8, 2024
Maintainer Author

@joshgura In the basic web interface on 7851 as shown in the "Basic web Interface" image on here #237 (link in the startup screen of AllTalk).

Or In the tts_engines.json file (manuall edit or copy the default one from the alltalk beta github page) https://github.com/erew123/alltalk_tts/wiki/FAQ,-Quirks-&-General-Questions#configuration-files-and-environment

Or there is a hidden API endpoint that I havent documented as of yet

joshgura Oct 9, 2024

my basic interface at http://127.0.0.1:7851/ didn't look like that one. mine looked a lot simpler. The one you linked looks more like gradio than the one that i see when i navigate to that port. possibly because i am using alltalk as a part of Text-generation-webui for koboldcpp. I guess I will try to use option 2, the Standalone Application to see. and then find out if it will still work with kobold, or run the other setup again afterwards.

at least under the Text-generation-webui mode, when alltalk crashes it doesn't respond to any changes i try to make (such as changing engine or refreshing). and can't start alltalk so no webui will show. so i'm not sure how i can change anything since the program remains in a wait loop for an engine to start before it eventually times out and exits.

i'm sure i can read through the messages on these error threads and force reinstall of everything and just avoid trying to use parler again, which is probably conflicting with the requirements of xtts or vice/versa. i guess the best thing would be if in the initial script, if there's an error loading an engine it should somehow revert to the last successful state. but alas, that's work.

one thing is for sure, the directory called 'parler' in /models didn't exist and that was causing a crash, (before choosing the parler engine would even be an option) but after manually creating that directory it skipped that crash. i wonder if other linux users have run into that problem. nevertheless, i don't think parler would be working on any linux system yet with alltalk2 unless someone could correct me.

edit:

For now, i'm deleting the environment and reinstalling it as you explained at the following post: #345

edit: that didn't work, so what i've done is clone the alltalk beta repo again and replace all the files in my install directory. that fixed the parler problem, but leaves alltalk2 in a broken state, (it thinks i haven't installed pytorch and a lot of other things that are already installed, such as xtts.) so i used the setup script and removed the environment again, and re-installing that. will let you know if that works. may as well reinstall the whole thing, but at least this way i feel like i'm being a little repair man.

edit: that didn't work, got a pytorchstremreader failed reading zip archive error. that's usually due to a partially downloaded tar.bz file but searching didn't find any .partial or tar.bz files. so, deleting the alltalk2 directory and reinstalling.

erew123 Oct 10, 2024
Maintainer Author

Hi @joshgura Im still away travelling, so limited access to reply. If you are encountering corrupted files on Pytorch or other installation, its usually best to clear the pip cache. I think I wrote something explaining this a bit more on the github wiki, but I cant recall on which page right at this moment. In effect though you would start_environment to load the Python environment, OR any command prompt that has access to the Python executable will do. You simply pip cache purge at the command prompt and this will clear out pip's cache, where the corrupted files will be. Typically corrupted files occur when there are connection blips on the internet and you can end up with a corrupted install package that PIP is downloading. After you have run that, pip will attempt to re-download the package file afresh, next time you try to install a package (or run the AllTalk installation).

Jidis83 Oct 14, 2024

Regarding that persistent crash stuff, I've noticed I can get into one of those as well if I forget the filename rules and accidentally put a space or something in my output file name. I have been running the basic 7851 interface over the past week, as it seems much less likely to error out on things, but I've finally got an OK GPU showing up at the end of this week, so hopefully, some of that will go away.

ewebgh33 · 2024-10-09T05:11:21Z

ewebgh33
Oct 9, 2024

I just tried installing this clean (removed the old alltalk folder).
Installed as per instructions.
Updated tgwui requirements as the install suggested. Did not install deepspeed (as it said to test AllTalk works first).
Launch tgwui. ModuleNotFoundError: No module named 'ffmpeg.asyncio'
Find the docs here that says to check Visual Studio 2022 components installed - they are.
Both
"MSVC v143 - VS 2022 C++ x64/x86 build tools"
"Windows 10 SDK" or "Windows 11 SDK" (depending on your version of Windows)
are installed.
Stalemate? What now?
Updated Alltalk via twgui env and atsetup.bat option 2 - git update. Eh, why not.
Rerun opt 1 requirements for twgui (2nd time now, but whatever).
Still error.
Update pip? I was getting this:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
coqui-tts 0.24.2 requires transformers<4.43.0,>=4.42.0, but you have transformers 4.43.3 which is incompatible.
torch-grammar 0.3.3 requires sentencepiece<0.2.0,>=0.1.99, but you have sentencepiece 0.2.0 which is incompatible.
tts 0.22.0 requires gruut[de,es,fr]==2.2.3, but you have gruut 2.4.0 which is incompatible.
tts 0.22.0 requires pandas<2.0,>=1.4, but you have pandas 2.2.3 which is incompatible.

I updated pip and re-ran requirements update again (3rd time) but get the same message.
tgwui is current, so between that and pip I don't know why I get those dependency errors - none of which are ffmpeg anyway!

Any hints what to do, appreciated. Thanks!

3 replies

joshgura Oct 9, 2024

are you installing and updating after starting the alltalk environment? because I think alltalk won't want to be looking at your system's environment for things. afaik, alltalk wants to install all its own copies of things, perhaps even ffmpeg. i might be wrong because I'm not using windows so i don't know how things are handled there.

ewebgh33 Oct 10, 2024

Did a clean install of textgen-webui and it all works now. Just to make sure everything was "as new".
Textgen/Ooba has it's own Conda env, when installing AllTalk as part of tgwui, it uses tgwui's conda env, no?
Or does it still make it's own unique one?

Anyway seems to be working now

erew123 Oct 10, 2024
Maintainer Author

Hi @ewebgh33 I am away travelling atm and the Coqui TTS engine has been updated, but I have not had time to version test its package requirements and changes (As I am not with a computer that is capable of it). However, I suspect that is part of the issue. Also, TGWUI has updated its Pytorch version and so DeepSpeed has not been built for that version yet (for the same reason I mentioned above).

That aside, if you have it working ,great!

If you install AllTalk as a standalone, it will use its own Conda environment. If you follow the TGWUI install route, it will use TGWUI's Conda environment. Both routes are covered on the WIKI https://github.com/erew123/alltalk_tts/wiki and just above on here, there is a bit of info I metion on the "issues" at the moment and routes of installation within TGWUI #245 (comment)

joshgura · 2024-10-10T02:15:00Z

joshgura
Oct 10, 2024

at one point i was presented with the option to install tts engines, but i skipped it thinking i already had the model. i didn't. so i re-ran alltalk setup hoping i would be presented with the option again, but it didn't happen. so i ran alltalk to see if i would get the option, but that didn't happen, I get

Error during setup: [Errno 2] No such file or directory: '/run/media/username/FIRESTORE/alltalk_beta/alltalk_tts/models/xtts'. Continuing without the TTS model.

but it doesn't continue, it just 'keeps trying' for 240 seconds and then eventually times out and exits.

so now i'm in a similar situation, i selected xtts in the gradio webui and now it thinks it is supposed to load that, but it doesn't exist so nothing happens. after this user error is made, then gradio or webui doesn't work anymore because alltalk times out before any webui goes online.

how do i get the screen that asks me which engines i want to install? i think it only shows that on first run, but not again. is that true?

Perhaps, like Automatic1111 stable diffusion webui, there ought to be a script file that's editable that is used to coordinate critical options, because if alltalk has an error the webui goes to an unresponsive zombie state. and after that first-run, the webui no longer will be available to undo changes, making some of the configuration changes potentially fatal to the entire installation. there really should be no tweak a user can make in the webui that breaks the program permanently.

maybe as a temporary fix, when there is an engine error, a toggle can be made that puts alltalk into a first-run state again?

or at least when alltalk setup is run, it can delete the persistent config data and place alltalk back into a first-run state again. because at present, running setup does not affect user configurations that were made at an earlier time.

more wit's end stuff:

i had a copy of the xtts model in alltalk version 1. "score" i thought. so I copied that into the models directory, after creating folder called xtts of course. boom! except anti-climax because the xtts version in old alltalk is xtts2_2.0.2 and alltalk2 won't load because it wants xtts2_2.0.3. (hugging face doesn't seem to have 2.0.3) -- but I'm so clever, i renamed the folder of 2.0.2 to 2.0.3 and gradio loaded! -- it loads into cuda but womp womp, generating "hello" never happens, it's just running the timer forever.

ok, got xtts working this way: renamed the xtts model's folder to 2.0.3, then i could open gradio again. i then changed engines to piper, which of course was not found, but this freed up xtts long enough for me to go back into the alltalk/model/xtts folder and rename xtts to it's proper name 2.0.2. then i loaded the xtts engine, got the "xtts_2.0.3 not found" error, and then at some point i was able to find 2.0.2 listed in the models. loaded it. now alltalk2 generates "hello". i will resist every urge to try another engine lest alltalk get stuck in a non working state until reinstallation is needed.

9 replies

phazei Nov 18, 2024

Yeah, I think that's what they meant. Not everything everything, just the initial downloaded things the first load downloads.

I did the exact thing they did. When I first tried the app, I wanted to try each of the engines, but the only obvious way I found to do that was to set the first run to true and choosing an option I hadn't selected before to get them downloaded.

SamAcctX Nov 18, 2024

For VITS and Piper, you should be able to narrow down the scope of the models to auto-download by language. If a user wants to download more than one language, they can just pick one then the other (e.g. EN_US first, then DE).

q5sys Nov 18, 2024

@erew123 I did mean everything "engines"... but only everything "models" for a single language. (I should have been more clear)

erew123 Nov 26, 2024
Maintainer Author

@q5sys Im busy as muck with other things atm, buuuuut.....https://github.com/erew123/alltalk_tts/blob/alltalkbeta/script.py#L39

Is it mostly an initial install thing where you want multiple models at installation?

q5sys Dec 13, 2024

Sorry for the delayed response, I too have been really busy with other things. But to answer your question, yes. Instead of having to restart alltalk four times to download everything, it'd be nice if there was an option when it starts and it prompts you with this list, that you could tell it to grab all four.

ewebgh33 · 2024-10-12T11:55:21Z

ewebgh33
Oct 12, 2024

Quick question to people who have standalone AllTalk v2 working

Did you downgrade transformers to version 4.40.2 for XTTS streaming support?
Or leave as is?

Instructions for standalone don't specify what to do at this point in setup or what option is most common or most recommended.
https://github.com/erew123/alltalk_tts/wiki/Install-%E2%80%90-Standalone-Installation

Thanks

5 replies

phazei Oct 13, 2024

I chose to downgrade to 4.40, but streaming still doesn't work. It's probably a recent version change for the libraries.

erew123 Oct 13, 2024
Maintainer Author

Hi @ewebgh33

Please read this here. Apologies for your install problems, but I hope that may give you a way forward.

Thanks

ewebgh33 Oct 14, 2024

@erew123 Don't offer apologies - you are awesome for building and maintaining this, and your thoroughness and great documentation is way beyond what many devs offer (I've tried a LOT of random AI stuff).

Thanks for directing my attention, I didn't read that thoroughly first time, MY apologies to you.

erew123 Oct 14, 2024
Maintainer Author

@ewebgh33 Thanks, appreciate that! :) FYI, I updated that WIKI page a bit more with some clearer/additional information and I believe I fixed the install routines for fresh install's now.

Jidis83 Oct 14, 2024

@erew123 - Ditto on everything ewebgh33 said. I've had plenty of commercial software where the devs didn't seem to be as into keeping up with what worked for people and what didn't. :-)

and regarding their question - The last few installs I've done, I've gone ahead and told it to downgrade transformers with no issues, though I do use AllTalk totally standalone and local at this point with no additional software.

Jidis83 · 2024-10-14T22:27:43Z

Jidis83
Oct 14, 2024

To anyone here,

Maybe a dumb question, but is there definitive info anywhere on which of the available XTTS models is good (or bad) at what? It seems like every time I try to clone a voice, I forget which one I got good results with and end up having to do a bunch of model switching and listening tests.

-Thanks!

4 replies

erew123 Oct 15, 2024
Maintainer Author

Typically the 2.0.2 model and 2.0.3 models are best and have slight variances in them with reproduction, though of course, the majority of the impact it back to the audio sample you are using. Its also worth noting that no two generations would always be the same and playing about with the temperature and repetition penalty have impacts on how close to the sample the model is. I believe I wrote something on that in the help page of the XTTS model within the Gradio page, so look there for an explanation.

Jidis83 Oct 15, 2024

Thanks @erew123 ,

So I guess the models are improving and there's no reason to go back to an older one for anything specific? And what is the deal with the APITTS selections? I seem to get better results with the 2.0.3 version of that, than with the XTTS. Those extra settings, I guess I don't have right now, since I'm staying in the basic web interface, but I'm looking forward to getting back to them when my new GPU gets here in a few days.

Lastly, hope things have settled down OK with the family stuff.

erew123 Oct 15, 2024
Maintainer Author

@Jidis83 There is no absolute answer as to which of those models will be the best. In theory the 2.0.3 should be better, but maybe not in all scenarios. Not to get too technical, but you would view it like this, the 2.0.2 model would be trained on dataset A for ??? hours and 2.0.3 trained on dataset B for ??? hours and at some point, someone at Coqui went "That will do, that sounds good to me". 2.0.3 is not specifically a further development of the 2.0.2 model and therefore will have its own strengths and weaknesses, but in theory should be better than 2.0.2.

APITTS vs XTTS use different methods of inference of the huggingface transformers. They are 2x different methods of running the AI model for reproduction. API method does not support DeepSpeed so is slower therefore (when DeepSpeed is available). There is no right or wrong method, they are just 2x different ways Coqui allowed TTS to be generated with those AI models. You may find more in depth answers at their site https://github.com/coqui-ai/TTS. The two methods do have different sounds, to my ear at least and I suspect APITTS may be better overall at reproduction, with the trade off of speed.

Thanks for your other message, sadly that will be ongoing for a long time.

Jidis83 Oct 16, 2024

Thanks again @erew123 !

Yeah, I think speed isn't really an issue here, as I don't really do anything requiring a quick output. I think with the current setup, DeepSpeed isn't turned on anyhow. It does indeed seem like APITTS has been better at most stuff here, I just was curious, as I thought I remembered people in the discussions here (or Reddit?) who had a specific preference toward one of the older ones.

Take Care

CRCODE22 · 2024-10-17T13:32:27Z

CRCODE22
Oct 17, 2024

Hi @erew123,

This is a really fun and good new tts engine can it be added to Alltalk?

https://github.com/SWivid/F5-TTS

Let's goo! F5-TTS 🔊

Trained on 100K hours of data
Zero-shot voice cloning
Speed control (based on total duration)
Emotion based synthesis
Long-form synthesis
Supports code-switching
Best part: CC-BY license (commercially permissive)🔥

Diffusion based architecture:

Non-Autoregressive + Flow Matching with DiT
Uses ConvNeXt to refine text representation, alignment

Elevenlabs-level TTS on your laptop.

I'm always skeptical about new AI models hyped up sounding too good to be true. But this...is crazy good.

And now, you can run the gradio app by @realmrfakename on your laptop with 1 click.

Meet @elonmusk, from Silicon Valley. https://t.co/wyuP5BD7pe pic.twitter.com/b9ndh5oVZd
— cocktail peanut (@cocktailpeanut) October 13, 2024

2 replies

erew123 Oct 17, 2024
Maintainer Author

Hi @CRCODE22 it's already in the features request list #74. I won't be able to get around to it for a while (please see my link to my support statement at the top of the page). However if anyone else wishes to take a short at coding it, they can read my guide/instructions on it, otherwise, I will look at it some time.

Thanks

CRCODE22 Oct 17, 2024

Hi @CRCODE22 it's already in the features request list #74. I won't be able to get around to it for a while (please see my link to my support statement at the top of the page). However if anyone else wishes to take a short at coding it, they can read my guide/instructions on it, otherwise, I will look at it some time.

Thanks

Thank you @erew123.

xdax1 · 2024-10-29T20:34:06Z

xdax1
Oct 29, 2024

Hello, I have this question, sometimes in the xtts model the voiceover says something completely different at the end of the text, some random words that are not there at all. Is there any option that I need to limit to eliminate this completely? (should I decrease this option or increase it?)

1 reply

erew123 Nov 1, 2024
Maintainer Author

@xdax1

Coqui's guide on XTTS https://docs.coqui.ai/en/latest/models/xtts.html#xtts

Key parameters that affect this behavior:

Temperature (current default: 0.75)
Lower values make the output more deterministic and "safe"
Higher values make it more creative but potentially unstable
To reduce hallucinations, try reducing this to 0.65 or even 0.5
The documentation suggests 0.85 as maximum.

Repetition Penalty (current default: 5.0)
Controls how much the model avoids repeating itself
Higher values (like the current 5.0) can sometimes cause the model to generate random content to avoid repetition
Try reducing this to around 2.0-3.0 (the documentation actually recommends 2.0)
This might help prevent the model from diverging at the end

FREDDRR99 · 2024-11-04T22:45:16Z

FREDDRR99
Nov 4, 2024

Can someone help me here !!

2 replies

erew123 Nov 4, 2024
Maintainer Author

@FREDDRR99 My fault, just corrected it! I obviously missed a closing bracket in an update yesterday. Just drop to the command line in the alltalk_tts folder and git pull which should update the tts_server.py file. Then start it again and it should be good to go!

FREDDRR99 Nov 4, 2024

yeah it seems that was the problem. i already fix it. thanks for the quick response !

elbee2048 · 2024-11-10T01:19:02Z

elbee2048
Nov 10, 2024

I'm having an issue with finetuning, and I really could use some help! I do the entire process, but after I compress and move the files, the "wavs" folder is completely empty. I never get any errors or anything up to that point, but I simply can't get any audio files to show up in that folder. I have tried different voices and files, yet this always happens.

6 replies

xdax1 Nov 10, 2024

@elbee2048 I still have a different problem all the time, the voiceover generates me some extra dialogue issue (not always). I have done hundreds of attempts of different options with temperature settings and Repetition Penalty and the same thing. I noticed that some voices have less of this, although I have each voiceover with clean sound, no noise, 6-30s etc I don't know where the difference comes from.

erew123 Nov 10, 2024
Maintainer Author

Typically this is just a feature of the XTTS model more than anything else (is my understanding). You can try using the API TTS method for generation as that does sometimes appear to be clearer. Ultimately you are probably best addressing these issues over here https://github.com/idiap/coqui-ai-TTS/discussions where the Coqui TTS engine is being managed/maintained nowadays as it would be something they would need to look into (if they can do anything about it). To be clear AllTalk is handing the text off to the the Coqui scripts/code for the actual TTS generation so they may have more of an idea as to what the underlying issue is (though as I say, I believe this just to be a feature of the model).

xdax1 Nov 10, 2024

Interesting, I changed to TTS API and it looks better, but need to check more because I only tested a little.
Thanks!

erew123 Nov 10, 2024
Maintainer Author

You're also welcome to try the F5-TTS engine too. Though my experience is that the cadence and flow of speech will more closely follow the audio sample, meaning there is less variety and expressiveness in its speech patterns. However of course, its a zero shot model so there is no specific finetuning to do.

elbee2048 Nov 10, 2024

@erew123 Yup, turns out the audio samples were just too small to move over when compressed! I checked before compressing and moving, and there were many audio files still there, just all very short. Thanks for the help!

ldavis9000aws · 2024-11-21T04:38:44Z

ldavis9000aws
Nov 21, 2024

@erew123 Great job incorporating f5tts! I think it would be nice if we could have whisper extract the text from the reference wave file and populate the "reference text" box. Then just make adjustments to the text as necessary.

8 replies

erew123 Nov 22, 2024
Maintainer Author

@SamAcctX Like this?

SamAcctX Nov 22, 2024

Hmm - did I really miss that? Time to spin up Docker!

erew123 Nov 22, 2024
Maintainer Author

@SamAcctX No you havnt missed it. Im finishing off the biggest update AllTalk has ever seen...... potentially out in a few days

SamAcctX Nov 22, 2024

@erew123 Awesome! Tag me when it's out and I'll build my Alltalk docker container and give it a spin! I have it set to pull and build the v2 branch's latest commit.

PS: Which Whisper engine is it using? The OG Whisper, WhisperX, FasterWhisper, or something else?

erew123 Nov 26, 2024
Maintainer Author

@SamAcctX I assume everyone gets a message when someone posts on here, but if not, well its been out a day ro so

erew123 · 2024-11-24T19:14:06Z

erew123
Nov 24, 2024
Maintainer Author

Dear all (whomever this ends up getting sent to, I have no idea with Girhub)

AllTalk v2 is up (still in the BETA area, so same/normal instructions).

Install - https://github.com/erew123/alltalk_tts/wiki/Install-%E2%80%90-Standalone-Installation
Update - https://github.com/erew123/alltalk_tts/tree/alltalkbeta?tab=readme-ov-file#-updating

REINSTALL THE REQUIREMENTS with ATSETUP (it will tell you off if you dont anyway)

It has not been fully validated on:

Integrated install into TGWUI (remote extension works fine, but not installation into TGWUI python environment)
EDIT DONE Google Colab. Should work the same, just not got around to it.
Docker. There is work in progress to build a proper docker, but this new version (should) in theory work on it (not tested as someone else is building the docker, not me)

There may be a bug here or there, but there's a lot of extra error catching and highly detailed debugging options

There is still work to do on tidy up some code in the TTS engines, though XTTS now processes and stores latents (its integration code was the only one I re-wrote, as a template for the others to be written by/part of the push to make it easier for other TTS engines to be added here but I still have to finish writing a WIKI document to explain the other bits.... no damn time in the day!

There is a fallback if someone ever needed to downgrade to the last beta version but still be able to upgrade in future.

There are some new features, there is EXTENSIVE newly written help on every page, so please read the help or WIKI before asking me questions.

The interface is cleaned up

This has been ??? hundred's of hours of work, so please be gentle with telling me you have found a bug or that I haven't included X feature or something yet. I'm quite tired from this (and dealing with my unwell family situation) and will do a few more bits yet, but also need a little break here and there.

Ill leave this here in case anyone feels generous

💖 Sponsor this Project on Ko-fi

Otherwise, enjoy!

Thanks

1 reply

ldavis9000aws Nov 26, 2024

Thanks so much for the time & effort you put into this project! You've made TTS more accessible to folks looking to get started. I was able to install it without an issue this evening on my Windows AWS EC2 instance. I did notice a couple of small bugs (like the E5 model not showing up for F5 TTS after it was downloaded, even after refreshing the settings, so just ended up having to restart alltalk to get them to show up), but it wasn't a show stopper. I'm excited to explore the new features. Again, great job!

subof · 2024-12-13T12:37:22Z

subof
Dec 13, 2024

Hi,
I am not able to download Whisper through a slow connection. Which Whisper model and from where could I download and put it in which folder, brought it on a flash drive? Thank you.

1 reply

erew123 Dec 13, 2024
Maintainer Author

@subof There are a choice of Whisper models, depending on what you want it to do... e.g.

And they are managed by Hugginface's code. I guess you could in theory copy one out of the from the .cache folder on one system and place it in another folder. But you would have to at least have it downloaded on a machine to copy it from there as there is tracking information folders within the folder, so you cannot just copy a file, you have to copy the top level folder with the tracking information.

xdax1 · 2024-12-21T11:03:32Z

xdax1
Dec 21, 2024

@erew123 Hello, can I somehow use .pth and .index files to create text to audio?

1 reply

erew123 Dec 21, 2024
Maintainer Author

@xdax1 Currently there are 5x text to speech engines built into AllTalk, 2x of which are voice cloning engines https://github.com/erew123/alltalk_tts/wiki/AllTalk-V2-QuickStart-Guide#currently-available-tts-engines

I am assuming by asking about PTH and Index files you are asking about RVC https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) which is a speech to speech pipeline (or voice to voice), which means it will further manipulate generated Text to speech of one of the 5 text to speech engines to make audio sounds like a famous celebrity or whatever voices you want. Obviously the voice cloning TTS engines can do that too.

Hopefully that clears things up for you a little. Please read those two wiki pages or see the manufactuers links for more details on them.

deathtome1998 · 2025-01-06T13:08:32Z

deathtome1998
Jan 6, 2025

I know that there's a known issues section, and the issue I'm having is in there, it's the crypt_E_no_revocation_check issue. however after trying to update my PC, check my firewall settings, and internet connectivity, as well as updating my root certificates as a last ditch attempt, I have had no luck with getting this issue to resolve. I know my system clock is correct for my timezone because everything matches my phone. I wouldn't even bother trying to install deespeed since I plan to use F5-TTS. but alltalk won't make me the start batch file so I can open the webui without moving past this step it seems. I can't even try to see if somehow installing deepspeed manually would do something as the only versions on the manual download page for deepspeed don't match my 12.3 version of CUDA. I did notice towards the beginning of the command terminal, there was this line after attempting to reinstall the requirements, The following packages will be SUPERSEDED by a higher-priority channel: certifi conda-forge/noarch::certifi-2024.12.1~ --> pkgs/main/win-64::certifi-2024.12.14-py311haa95532_0
not sure if that might be the issue, and if it is, how to fix it?

8 replies

erew123 Jan 6, 2025
Maintainer Author

Oh, I see there is no longer an automatic installer! :/

Ok, well on windows you should be on curl version 8.9.1 which is Microsofts supplied build of curl. Check at the command prompt with "curl --version"

Assuming that is the case, then its your Antivirus or firewall being an issue somewhere...... I would suggest going for the --ssl-no-revoke method. I will update the wiki to reflect this and add basic instrucitons.

deathtome1998 Jan 6, 2025

where exactly on the curl commands should I be putting the --ssl-no-revoke line? I am on the correct version of curl, and after trying to put that line at the ending of the entire curl commands that are written in the bat file, I even tried 2 different places on one of the curl commands because it was a bit confusing what was actually the end. should it come directly after it says curl? or somewhere else? because trying to add --ssl-no-revoke to the end of each curl command doesn't seem to work for me as I'm still getting the error for installing deepspeed.

erew123 Jan 6, 2025
Maintainer Author

Looks like it has to be before the -LO according to the CURL code.

Download/extract and run that.

Thanks

deathtome1998 Jan 6, 2025

just had a chance to download and run that file. still gives the same error. not sure why using the file you provided didn't fix it. any other suggestions?

erew123 Jan 6, 2025
Maintainer Author

I dont have any other suggestions as far as CURL is concerned. It would take an investigation over your entire machine and internet to fully diagnose whats going on.

Try this one.....

atsetup-ssl-bypass.zip

If that doesnt work, the only thing I can suggest is to delete line 529:

Run it again and manually download and install deepspeed, if you want it.

https://github.com/erew123/alltalk_tts/releases/download/DeepSpeed-14.0/deepspeed-0.14.0+ce78a63-cp311-cp311-win_amd64.whl

AllTalk v2 Download Details & Discussion #245

erew123 Jun 6, 2024 Maintainer

AllTalk v2 is out of BETA (November 24th 2024)

Replies: 105 comments · 314 replies

erew123 Jun 7, 2024 Maintainer Author

RE "[AllTalk ENG] Warning: Model 'tts_models--en--jenny--jenny' does not match any known model type."

Where can we download some default RVC voices just for testing?

erew123 Jun 7, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Nov 27, 2024 Maintainer Author

erew123 Nov 27, 2024 Maintainer Author

erew123 Jun 10, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 13, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 12, 2024 Maintainer Author

erew123 Jun 14, 2024 Maintainer Author

erew123 Jun 20, 2024 Maintainer Author

erew123 Jun 24, 2024 Maintainer Author

erew123 Jun 22, 2024 Maintainer Author

erew123 Oct 7, 2024 Maintainer Author

erew123 Oct 8, 2024 Maintainer Author

erew123 Oct 8, 2024 Maintainer Author

erew123 Oct 10, 2024 Maintainer Author

erew123
Jun 6, 2024
Maintainer

Replies: 105 comments 314 replies

erew123 Jun 7, 2024
Maintainer Author

erew123 Jun 7, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Nov 27, 2024
Maintainer Author

erew123 Nov 27, 2024
Maintainer Author

erew123 Jun 10, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 13, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 12, 2024
Maintainer Author

erew123 Jun 14, 2024
Maintainer Author

erew123 Jun 20, 2024
Maintainer Author

erew123 Jun 24, 2024
Maintainer Author

erew123 Jun 22, 2024
Maintainer Author

erew123 Oct 7, 2024
Maintainer Author

erew123 Oct 8, 2024
Maintainer Author

erew123 Oct 8, 2024
Maintainer Author

erew123 Oct 10, 2024
Maintainer Author