Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

nitinmukesh · 2024-09-14T19:01:11Z

Describe the bug
Trying to generate Hindi audio but it is coming up with error. English audio is successfully generated.

To Reproduce
Launch the UI and enter Hindi text in Text Input. Select xtts - xttsv2_2.0.3. Under Advanced settings > Select Language as Hi. Click Generate TTS.

Screenshots
N.A.

Text/logs

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 32.82 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'

Desktop (please complete the following information):
AllTalk was updated: [approx. date]: 15th August 2024 at 08:27
Custom Python environment: [yes/no give details if yes] No
Text-generation-webUI was updated: [approx. date] It's beta version and upto date

Additional context

The text was updated successfully, but these errors were encountered:

nitinmukesh · 2024-09-14T19:06:18Z

Tried French and English, both working. Hindi Not working. Please help.

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 39.11 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] Bonjour! comment vas-tu aujourd'hui?
C:\tut\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:544: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[AllTalk GEN] TTS Generate: 5.50 seconds. LowVRAM: False DeepSpeed: False

nitinmukesh · 2024-09-14T19:22:57Z

tried updating tokenizer.py (alltalk_tts\system\ft_tokenizer). I manually updated the changes instead of replacing the file.
tokenizer.txt

Referred to the following for above change
coqui-ai/TTS#3655

Still same issue.

erew123 · 2024-09-16T21:26:22Z

Hi @nitinmukesh

Are you specifically attempting to use "Streaming"? I cannot say if the Coqui engine ever supported streaming with Hindi. No reason it should but I dont know if it does.

Nonetheless, I tried some Devanagari script and it passed through fine on my PC. I also tried yours "नमस्ते! आज आप कैसे हैं?" and that passed through fine.

My system that I tested on is a fresh install (shown below) and has all the following package versions (you can run start_diagnostics to create a diagnostics.log file and compare versions on your system).

diagnostics.log contents as of 16th Sept 2024

PACKAGE VERSIONS vs REQUIREMENTS FILE:
 coqui-tts           Required: >= 0.24.1        Installed: 0.24.1
 faster-whisper      Required: >= 1.0.3         Installed: 1.0.3
 fuzzywuzzy          Required: >= 0.18.0        Installed: 0.18.0
 gradio              Required: >= 4.26.0        Installed: 4.32.2
 importlib_metadata  Required: >= 7.2.1         Installed: 8.5.0
 inputimeout         Required: >= 1.0.4         Installed: 1.0.4
 Jinja2              Required: >= 3.1.4         Installed: 3.1.4
 librosa             Required: >= 0.10.2.post1  Installed: 0.10.2.post1
 nvidia-cublas-cu11  Required: >= 11.11.3.6     Installed: 11.11.3.6
 nvidia-cudnn-cu11   Required: >= 9.1.1.17      Installed: 9.4.0.58
 onnxruntime-gpu     Required: >= 1.18.1        Installed: 1.19.2
 pydantic            Required: >= 2.8.2         Installed: 2.9.1
 python-ffmpeg       Required: >= 2.0.12        Installed: 2.0.12
 python-Levenshtein  Required: >= 0.25.1        Installed: 0.25.1
 praat-parselmouth   Required: >= 0.4.4         Installed: 0.4.4
 pyworld             Required: >= 0.3.4         Installed: 0.3.4
 sounddevice         Required: >= 0.4.7         Installed: 0.5.0
 soundfile           Required: >= 0.12.1        Installed: 0.12.1
 spacy               Required: >= 3.7.1         Installed: 3.7.6
 torchcrepe          Required: >= 0.0.2         Installed: 0.0.23
 tqdm                Required: >= 4.66.5        Installed: 4.66.5
 unidic-lite         Required: >= 1.0.8         Installed: 1.0.8
 uvicorn             Required: >= 0.29.0        Installed: 0.30.6
 pillow              Required: == 10.3.0        Installed: 10.3.0
 pypinyin            Required: >= 0.52.0        Installed: 0.53.0
 word2number         Required: >= 1.1           Installed: 1.1
 cutlet              Required: == 0.4.0         Installed: 0.4.0
 fugashi             Required: == 1.3.1         Installed: 1.3.1
 fastapi             Required: == 0.112.2       Installed: 0.112.2

PYTHON PACKAGES:
 absl-py>= 2.1.0
 aiofiles>= 23.2.1
 aiohappyeyeballs>= 2.4.0
 aiohttp>= 3.10.5
 aiosignal>= 1.3.1
 altair>= 5.4.1
 annotated-types>= 0.7.0
 antlr4-python3-runtime>= 4.9.3
 anyascii>= 0.3.2
 anyio>= 4.4.0
 argbind>= 0.3.9
 asttokens>= 2.4.1
 attrs>= 24.2.0
 audioread>= 3.0.1
 av>= 12.3.0
 babel>= 2.16.0
 bitarray>= 2.9.2
 blis>= 0.7.11
 Brotli>= 1.0.9
 catalogue>= 2.0.10
 certifi>= 2024.8.30
 cffi>= 1.17.1
 charset-normalizer>= 3.3.2
 click>= 8.1.7
 cloudpathlib>= 0.19.0
 colorama>= 0.4.6
 coloredlogs>= 15.0.1
 confection>= 0.1.5
 contourpy>= 1.3.0
 coqpit>= 0.0.17
 coqui-tts>= 0.24.1
 coqui-tts-trainer>= 0.1.5
 ctranslate2>= 4.4.0
 cutlet>= 0.4.0
 cycler>= 0.12.1
 cymem>= 2.0.8
 Cython>= 3.0.11
 dateparser>= 1.1.8
 decorator>= 5.1.1
 deepspeed>= 0.14.0+ce78a63
 descript-audiotools>= 0.7.2
 descript-audio-codec>= 1.0.0
 docopt>= 0.6.2
 docstring_parser>= 0.16
 einops>= 0.8.0
 encodec>= 0.1.1
 executing>= 2.1.0
 fairseq>= 0.12.4
 faiss>= 1.8.0
 fastapi>= 0.112.2
 faster-whisper>= 1.0.3
 ffmpy>= 0.4.0
 filelock>= 3.13.1
 fire>= 0.6.0
 flatbuffers>= 24.3.25
 flatten-dict>= 0.4.2
 fonttools>= 4.53.1
 frozenlist>= 1.4.1
 fsspec>= 2024.9.0
 fugashi>= 1.3.1
 future>= 1.0.0
 fuzzywuzzy>= 0.18.0
 gmpy2>= 2.1.2
 gradio>= 4.32.2
 gradio_client>= 0.17.0
 grpcio>= 1.66.1
 gruut>= 2.2.3
 gruut-ipa>= 0.13.0
 gruut_lang_de>= 2.0.1
 gruut_lang_en>= 2.0.1
 gruut_lang_es>= 2.0.1
 gruut_lang_fr>= 2.0.2
 h11>= 0.14.0
 hangul-romanize>= 0.1.0
 hjson>= 3.1.0
 httpcore>= 1.0.5
 httpx>= 0.27.2
 huggingface-hub>= 0.24.7
 humanfriendly>= 10.0
 hydra-core>= 1.3.2
 idna>= 3.7
 importlib_metadata>= 8.5.0
 importlib_resources>= 6.4.5
 inflect>= 7.4.0
 inputimeout>= 1.0.4
 ipython>= 8.27.0
 jaconv>= 0.4.0
 jedi>= 0.19.1
 Jinja2>= 3.1.4
 joblib>= 1.4.2
 jsonlines>= 1.2.0
 jsonschema>= 4.23.0
 jsonschema-specifications>= 2023.12.1
 julius>= 0.2.7
 kiwisolver>= 1.4.7
 langcodes>= 3.4.0
 language_data>= 1.2.0
 lazy_loader>= 0.4
 Levenshtein>= 0.25.1
 librosa>= 0.10.2.post1
 llvmlite>= 0.43.0
 local-attention>= 1.9.15
 lxml>= 5.3.0
 marisa-trie>= 1.2.0
 Markdown>= 3.7
 markdown2>= 2.5.0
 markdown-it-py>= 3.0.0
 MarkupSafe>= 2.1.3
 matplotlib>= 3.9.2
 matplotlib-inline>= 0.1.7
 mdurl>= 0.1.2
 mkl_fft>= 1.3.10
 mkl_random>= 1.2.7
 mkl-service>= 2.4.0
 mojimoji>= 0.0.13
 more-itertools>= 10.5.0
 mpmath>= 1.3.0
 msgpack>= 1.1.0
 multidict>= 6.1.0
 murmurhash>= 1.0.10
 narwhals>= 1.8.0
 networkx>= 2.8.8
 ninja>= 1.11.1.1
 num2words>= 0.5.13
 numba>= 0.60.0
 numpy>= 1.26.4
 nvidia-cublas-cu11>= 11.11.3.6
 nvidia-cuda-nvrtc-cu11>= 11.8.89
 nvidia-cudnn-cu11>= 9.4.0.58
 omegaconf>= 2.3.0
 onnxruntime>= 1.19.2
 onnxruntime-gpu>= 1.19.2
 orjson>= 3.10.7
 packaging>= 24.1
 pandas>= 2.2.2
 parler_tts>= 0.2
 parso>= 0.8.4
 pillow>= 10.3.0
 pip>= 24.2
 platformdirs>= 4.3.3
 pooch>= 1.8.2
 portalocker>= 2.10.1
 praat-parselmouth>= 0.4.4
 preshed>= 3.0.9
 prompt_toolkit>= 3.0.47
 protobuf>= 3.19.6
 psutil>= 6.0.0
 pure_eval>= 0.2.3
 pycparser>= 2.22
 pydantic>= 2.9.1
 pydantic_core>= 2.23.3
 pydub>= 0.25.1
 pyee>= 12.0.0
 Pygments>= 2.18.0
 pyloudnorm>= 0.1.1
 pynndescent>= 0.5.13
 pynvml>= 11.5.3
 pyparsing>= 3.1.4
 pypinyin>= 0.53.0
 pyreadline3>= 3.5.2
 pysbd>= 0.3.4
 PySocks>= 1.7.1
 pystoi>= 0.4.1
 python-crfsuite>= 0.9.10
 python-dateutil>= 2.9.0.post0
 python-ffmpeg>= 2.0.12
 python-Levenshtein>= 0.25.1
 python-multipart>= 0.0.9
 pytz>= 2024.2
 pywin32>= 306
 pyworld>= 0.3.4
 PyYAML>= 6.0.1
 py-cpuinfo>= 9.0.0
 randomname>= 0.2.1
 rapidfuzz>= 3.9.7
 referencing>= 0.35.1
 regex>= 2024.9.11
 requests>= 2.32.3
 resampy>= 0.4.3
 rich>= 13.8.1
 rotary-embedding-torch>= 0.8.3
 rpds-py>= 0.20.0
 ruff>= 0.6.5
 sacrebleu>= 2.4.3
 safetensors>= 0.4.5
 scikit-learn>= 1.5.2
 scipy>= 1.14.1
 semantic-version>= 2.10.0
 sentencepiece>= 0.2.0
 setuptools>= 72.1.0
 shellingham>= 1.5.4
 six>= 1.16.0
 smart-open>= 7.0.4
 sniffio>= 1.3.1
 sounddevice>= 0.5.0
 soundfile>= 0.12.1
 soxr>= 0.5.0.post1
 spacy>= 3.7.6
 spacy-legacy>= 3.0.12
 spacy-loggers>= 1.0.5
 srsly>= 2.4.8
 stack-data>= 0.6.3
 starlette>= 0.38.5
 SudachiDict-core>= 20240716
 SudachiPy>= 0.6.8
 sympy>= 1.13.2
 tabulate>= 0.9.0
 tensorboard>= 2.17.1
 tensorboard-data-server>= 0.7.2
 termcolor>= 2.4.0
 thinc>= 8.2.5
 threadpoolctl>= 3.5.0
 tokenizers>= 0.19.1
 tomlkit>= 0.12.0
 torch>= 2.2.1
 torchaudio>= 2.2.1
 torchcrepe>= 0.0.23
 torchvision>= 0.17.1
 torch-stoi>= 0.2.1
 tqdm>= 4.66.5
 traitlets>= 5.14.3
 transformers>= 4.40.2
 typeguard>= 4.3.0
 typer>= 0.12.5
 typing_extensions>= 4.11.0
 tzdata>= 2024.1
 tzlocal>= 5.2
 umap-learn>= 0.5.6
 unidic-lite>= 1.0.8
 urllib3>= 2.2.2
 uvicorn>= 0.30.6
 wasabi>= 1.1.3
 wcwidth>= 0.2.13
 weasel>= 0.4.1
 websockets>= 11.0.3
 Werkzeug>= 3.0.4
 wheel>= 0.44.0
 win-inet-pton>= 1.1.0
 word2number>= 1.1
 wrapt>= 1.16.0
 yarl>= 1.11.1
 zipp>= 3.20.2

Bar something not installing correctly on your system, I am not sure what would cause this issue. Though, perhaps there are some system locale type issues as those can cause issues with how letters are sometimes interpreted https://learn.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-international-core-winpe-systemlocale

I suppose its possible that your system locale could cause an issue, but I would be unable to diagnose that for you.

I assume you have done a full fresh installation of AllTalk V2 and not just copied over a V1 installation? You may wish to re-try setting up the Python environment.

Finally, I am not the maintainer of the Coqui TTS engine, that is done by idiap and I can see they are working on additional Hindi support idiap/coqui-ai-TTS@1920328 though that update is not yet available.

Thanks

erew123 · 2024-10-01T13:43:22Z

@nitinmukesh Further to this, while writing further documentation for V2 of AllTalk, I was looking over the V1 help and there is here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues

As such, you need to load the 2.0.3 model as API mode.

nitinmukesh · 2024-10-01T18:57:28Z

Thank you @erew123
I hope the support for Hindi language is added soon. I did tried everything reinstalling, etc.. but it didn't work.

Currently using Google TTS for Hindi.

I will try the api one as suggested by you. Appreciate your guidance

erew123 · 2024-10-01T21:31:58Z

@nitinmukesh As mentioned above, you CAN use Hindi, if you load the XTTS 2.0.3 model as apitts (API mode)

nitinmukesh · 2024-10-02T16:16:24Z

@erew123

I did understood it and mentioned the same in my earlier response.

I will try the api one as suggested by you. Appreciate your guidance

I should have mentioned apitts. Appreciate your guidance in making this work. Thank you

erew123 closed this as completed Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

nitinmukesh commented Sep 14, 2024

nitinmukesh commented Sep 14, 2024

nitinmukesh commented Sep 14, 2024 •

edited

Loading

erew123 commented Sep 16, 2024 •

edited

Loading

erew123 commented Oct 1, 2024

nitinmukesh commented Oct 1, 2024 •

edited

Loading

erew123 commented Oct 1, 2024

nitinmukesh commented Oct 2, 2024 •

edited

Loading

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

Comments

nitinmukesh commented Sep 14, 2024

nitinmukesh commented Sep 14, 2024

nitinmukesh commented Sep 14, 2024 • edited Loading

erew123 commented Sep 16, 2024 • edited Loading

diagnostics.log contents as of 16th Sept 2024

erew123 commented Oct 1, 2024

nitinmukesh commented Oct 1, 2024 • edited Loading

erew123 commented Oct 1, 2024

nitinmukesh commented Oct 2, 2024 • edited Loading

nitinmukesh commented Sep 14, 2024 •

edited

Loading

erew123 commented Sep 16, 2024 •

edited

Loading

nitinmukesh commented Oct 1, 2024 •

edited

Loading

nitinmukesh commented Oct 2, 2024 •

edited

Loading