Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

Closed
nitinmukesh opened this issue Sep 14, 2024 · 7 comments
Closed

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

nitinmukesh opened this issue Sep 14, 2024 · 7 comments

Comments

@nitinmukesh
Copy link

Describe the bug
Trying to generate Hindi audio but it is coming up with error. English audio is successfully generated.

To Reproduce
Launch the UI and enter Hindi text in Text Input. Select xtts - xttsv2_2.0.3. Under Advanced settings > Select Language as Hi. Click Generate TTS.

Screenshots
N.A.

Text/logs

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 32.82 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'

Desktop (please complete the following information):
AllTalk was updated: [approx. date]: 15th August 2024 at 08:27
Custom Python environment: [yes/no give details if yes] No
Text-generation-webUI was updated: [approx. date] It's beta version and upto date

Additional context

image

image

@nitinmukesh
Copy link
Author

Tried French and English, both working. Hindi Not working. Please help.

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 39.11 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] Bonjour! comment vas-tu aujourd'hui?
C:\tut\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:544: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[AllTalk GEN] TTS Generate: 5.50 seconds. LowVRAM: False DeepSpeed: False

@nitinmukesh
Copy link
Author

nitinmukesh commented Sep 14, 2024

tried updating tokenizer.py (alltalk_tts\system\ft_tokenizer). I manually updated the changes instead of replacing the file.
tokenizer.txt

Referred to the following for above change
coqui-ai/TTS#3655

Still same issue.

@erew123
Copy link
Owner

erew123 commented Sep 16, 2024

Hi @nitinmukesh

Are you specifically attempting to use "Streaming"? I cannot say if the Coqui engine ever supported streaming with Hindi. No reason it should but I dont know if it does.

Nonetheless, I tried some Devanagari script and it passed through fine on my PC. I also tried yours "नमस्ते! आज आप कैसे हैं?" and that passed through fine.

image

image

My system that I tested on is a fresh install (shown below) and has all the following package versions (you can run start_diagnostics to create a diagnostics.log file and compare versions on your system).

image

diagnostics.log contents as of 16th Sept 2024

PACKAGE VERSIONS vs REQUIREMENTS FILE:
 coqui-tts           Required: >= 0.24.1        Installed: 0.24.1
 faster-whisper      Required: >= 1.0.3         Installed: 1.0.3
 fuzzywuzzy          Required: >= 0.18.0        Installed: 0.18.0
 gradio              Required: >= 4.26.0        Installed: 4.32.2
 importlib_metadata  Required: >= 7.2.1         Installed: 8.5.0
 inputimeout         Required: >= 1.0.4         Installed: 1.0.4
 Jinja2              Required: >= 3.1.4         Installed: 3.1.4
 librosa             Required: >= 0.10.2.post1  Installed: 0.10.2.post1
 nvidia-cublas-cu11  Required: >= 11.11.3.6     Installed: 11.11.3.6
 nvidia-cudnn-cu11   Required: >= 9.1.1.17      Installed: 9.4.0.58
 onnxruntime-gpu     Required: >= 1.18.1        Installed: 1.19.2
 pydantic            Required: >= 2.8.2         Installed: 2.9.1
 python-ffmpeg       Required: >= 2.0.12        Installed: 2.0.12
 python-Levenshtein  Required: >= 0.25.1        Installed: 0.25.1
 praat-parselmouth   Required: >= 0.4.4         Installed: 0.4.4
 pyworld             Required: >= 0.3.4         Installed: 0.3.4
 sounddevice         Required: >= 0.4.7         Installed: 0.5.0
 soundfile           Required: >= 0.12.1        Installed: 0.12.1
 spacy               Required: >= 3.7.1         Installed: 3.7.6
 torchcrepe          Required: >= 0.0.2         Installed: 0.0.23
 tqdm                Required: >= 4.66.5        Installed: 4.66.5
 unidic-lite         Required: >= 1.0.8         Installed: 1.0.8
 uvicorn             Required: >= 0.29.0        Installed: 0.30.6
 pillow              Required: == 10.3.0        Installed: 10.3.0
 pypinyin            Required: >= 0.52.0        Installed: 0.53.0
 word2number         Required: >= 1.1           Installed: 1.1
 cutlet              Required: == 0.4.0         Installed: 0.4.0
 fugashi             Required: == 1.3.1         Installed: 1.3.1
 fastapi             Required: == 0.112.2       Installed: 0.112.2

PYTHON PACKAGES:
 absl-py>= 2.1.0
 aiofiles>= 23.2.1
 aiohappyeyeballs>= 2.4.0
 aiohttp>= 3.10.5
 aiosignal>= 1.3.1
 altair>= 5.4.1
 annotated-types>= 0.7.0
 antlr4-python3-runtime>= 4.9.3
 anyascii>= 0.3.2
 anyio>= 4.4.0
 argbind>= 0.3.9
 asttokens>= 2.4.1
 attrs>= 24.2.0
 audioread>= 3.0.1
 av>= 12.3.0
 babel>= 2.16.0
 bitarray>= 2.9.2
 blis>= 0.7.11
 Brotli>= 1.0.9
 catalogue>= 2.0.10
 certifi>= 2024.8.30
 cffi>= 1.17.1
 charset-normalizer>= 3.3.2
 click>= 8.1.7
 cloudpathlib>= 0.19.0
 colorama>= 0.4.6
 coloredlogs>= 15.0.1
 confection>= 0.1.5
 contourpy>= 1.3.0
 coqpit>= 0.0.17
 coqui-tts>= 0.24.1
 coqui-tts-trainer>= 0.1.5
 ctranslate2>= 4.4.0
 cutlet>= 0.4.0
 cycler>= 0.12.1
 cymem>= 2.0.8
 Cython>= 3.0.11
 dateparser>= 1.1.8
 decorator>= 5.1.1
 deepspeed>= 0.14.0+ce78a63
 descript-audiotools>= 0.7.2
 descript-audio-codec>= 1.0.0
 docopt>= 0.6.2
 docstring_parser>= 0.16
 einops>= 0.8.0
 encodec>= 0.1.1
 executing>= 2.1.0
 fairseq>= 0.12.4
 faiss>= 1.8.0
 fastapi>= 0.112.2
 faster-whisper>= 1.0.3
 ffmpy>= 0.4.0
 filelock>= 3.13.1
 fire>= 0.6.0
 flatbuffers>= 24.3.25
 flatten-dict>= 0.4.2
 fonttools>= 4.53.1
 frozenlist>= 1.4.1
 fsspec>= 2024.9.0
 fugashi>= 1.3.1
 future>= 1.0.0
 fuzzywuzzy>= 0.18.0
 gmpy2>= 2.1.2
 gradio>= 4.32.2
 gradio_client>= 0.17.0
 grpcio>= 1.66.1
 gruut>= 2.2.3
 gruut-ipa>= 0.13.0
 gruut_lang_de>= 2.0.1
 gruut_lang_en>= 2.0.1
 gruut_lang_es>= 2.0.1
 gruut_lang_fr>= 2.0.2
 h11>= 0.14.0
 hangul-romanize>= 0.1.0
 hjson>= 3.1.0
 httpcore>= 1.0.5
 httpx>= 0.27.2
 huggingface-hub>= 0.24.7
 humanfriendly>= 10.0
 hydra-core>= 1.3.2
 idna>= 3.7
 importlib_metadata>= 8.5.0
 importlib_resources>= 6.4.5
 inflect>= 7.4.0
 inputimeout>= 1.0.4
 ipython>= 8.27.0
 jaconv>= 0.4.0
 jedi>= 0.19.1
 Jinja2>= 3.1.4
 joblib>= 1.4.2
 jsonlines>= 1.2.0
 jsonschema>= 4.23.0
 jsonschema-specifications>= 2023.12.1
 julius>= 0.2.7
 kiwisolver>= 1.4.7
 langcodes>= 3.4.0
 language_data>= 1.2.0
 lazy_loader>= 0.4
 Levenshtein>= 0.25.1
 librosa>= 0.10.2.post1
 llvmlite>= 0.43.0
 local-attention>= 1.9.15
 lxml>= 5.3.0
 marisa-trie>= 1.2.0
 Markdown>= 3.7
 markdown2>= 2.5.0
 markdown-it-py>= 3.0.0
 MarkupSafe>= 2.1.3
 matplotlib>= 3.9.2
 matplotlib-inline>= 0.1.7
 mdurl>= 0.1.2
 mkl_fft>= 1.3.10
 mkl_random>= 1.2.7
 mkl-service>= 2.4.0
 mojimoji>= 0.0.13
 more-itertools>= 10.5.0
 mpmath>= 1.3.0
 msgpack>= 1.1.0
 multidict>= 6.1.0
 murmurhash>= 1.0.10
 narwhals>= 1.8.0
 networkx>= 2.8.8
 ninja>= 1.11.1.1
 num2words>= 0.5.13
 numba>= 0.60.0
 numpy>= 1.26.4
 nvidia-cublas-cu11>= 11.11.3.6
 nvidia-cuda-nvrtc-cu11>= 11.8.89
 nvidia-cudnn-cu11>= 9.4.0.58
 omegaconf>= 2.3.0
 onnxruntime>= 1.19.2
 onnxruntime-gpu>= 1.19.2
 orjson>= 3.10.7
 packaging>= 24.1
 pandas>= 2.2.2
 parler_tts>= 0.2
 parso>= 0.8.4
 pillow>= 10.3.0
 pip>= 24.2
 platformdirs>= 4.3.3
 pooch>= 1.8.2
 portalocker>= 2.10.1
 praat-parselmouth>= 0.4.4
 preshed>= 3.0.9
 prompt_toolkit>= 3.0.47
 protobuf>= 3.19.6
 psutil>= 6.0.0
 pure_eval>= 0.2.3
 pycparser>= 2.22
 pydantic>= 2.9.1
 pydantic_core>= 2.23.3
 pydub>= 0.25.1
 pyee>= 12.0.0
 Pygments>= 2.18.0
 pyloudnorm>= 0.1.1
 pynndescent>= 0.5.13
 pynvml>= 11.5.3
 pyparsing>= 3.1.4
 pypinyin>= 0.53.0
 pyreadline3>= 3.5.2
 pysbd>= 0.3.4
 PySocks>= 1.7.1
 pystoi>= 0.4.1
 python-crfsuite>= 0.9.10
 python-dateutil>= 2.9.0.post0
 python-ffmpeg>= 2.0.12
 python-Levenshtein>= 0.25.1
 python-multipart>= 0.0.9
 pytz>= 2024.2
 pywin32>= 306
 pyworld>= 0.3.4
 PyYAML>= 6.0.1
 py-cpuinfo>= 9.0.0
 randomname>= 0.2.1
 rapidfuzz>= 3.9.7
 referencing>= 0.35.1
 regex>= 2024.9.11
 requests>= 2.32.3
 resampy>= 0.4.3
 rich>= 13.8.1
 rotary-embedding-torch>= 0.8.3
 rpds-py>= 0.20.0
 ruff>= 0.6.5
 sacrebleu>= 2.4.3
 safetensors>= 0.4.5
 scikit-learn>= 1.5.2
 scipy>= 1.14.1
 semantic-version>= 2.10.0
 sentencepiece>= 0.2.0
 setuptools>= 72.1.0
 shellingham>= 1.5.4
 six>= 1.16.0
 smart-open>= 7.0.4
 sniffio>= 1.3.1
 sounddevice>= 0.5.0
 soundfile>= 0.12.1
 soxr>= 0.5.0.post1
 spacy>= 3.7.6
 spacy-legacy>= 3.0.12
 spacy-loggers>= 1.0.5
 srsly>= 2.4.8
 stack-data>= 0.6.3
 starlette>= 0.38.5
 SudachiDict-core>= 20240716
 SudachiPy>= 0.6.8
 sympy>= 1.13.2
 tabulate>= 0.9.0
 tensorboard>= 2.17.1
 tensorboard-data-server>= 0.7.2
 termcolor>= 2.4.0
 thinc>= 8.2.5
 threadpoolctl>= 3.5.0
 tokenizers>= 0.19.1
 tomlkit>= 0.12.0
 torch>= 2.2.1
 torchaudio>= 2.2.1
 torchcrepe>= 0.0.23
 torchvision>= 0.17.1
 torch-stoi>= 0.2.1
 tqdm>= 4.66.5
 traitlets>= 5.14.3
 transformers>= 4.40.2
 typeguard>= 4.3.0
 typer>= 0.12.5
 typing_extensions>= 4.11.0
 tzdata>= 2024.1
 tzlocal>= 5.2
 umap-learn>= 0.5.6
 unidic-lite>= 1.0.8
 urllib3>= 2.2.2
 uvicorn>= 0.30.6
 wasabi>= 1.1.3
 wcwidth>= 0.2.13
 weasel>= 0.4.1
 websockets>= 11.0.3
 Werkzeug>= 3.0.4
 wheel>= 0.44.0
 win-inet-pton>= 1.1.0
 word2number>= 1.1
 wrapt>= 1.16.0
 yarl>= 1.11.1
 zipp>= 3.20.2

Bar something not installing correctly on your system, I am not sure what would cause this issue. Though, perhaps there are some system locale type issues as those can cause issues with how letters are sometimes interpreted https://learn.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-international-core-winpe-systemlocale

I suppose its possible that your system locale could cause an issue, but I would be unable to diagnose that for you.

I assume you have done a full fresh installation of AllTalk V2 and not just copied over a V1 installation? You may wish to re-try setting up the Python environment.

Finally, I am not the maintainer of the Coqui TTS engine, that is done by idiap and I can see they are working on additional Hindi support idiap/coqui-ai-TTS@1920328 though that update is not yet available.

Thanks

@erew123 erew123 closed this as completed Sep 16, 2024
@erew123
Copy link
Owner

erew123 commented Oct 1, 2024

@nitinmukesh Further to this, while writing further documentation for V2 of AllTalk, I was looking over the V1 help and there is here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues

image

As such, you need to load the 2.0.3 model as API mode.

image

@nitinmukesh
Copy link
Author

nitinmukesh commented Oct 1, 2024

Thank you @erew123
I hope the support for Hindi language is added soon. I did tried everything reinstalling, etc.. but it didn't work.

Currently using Google TTS for Hindi.

I will try the api one as suggested by you. Appreciate your guidance

@erew123
Copy link
Owner

erew123 commented Oct 1, 2024

@nitinmukesh As mentioned above, you CAN use Hindi, if you load the XTTS 2.0.3 model as apitts (API mode)

image

@nitinmukesh
Copy link
Author

nitinmukesh commented Oct 2, 2024

@erew123

I did understood it and mentioned the same in my earlier response.

I will try the api one as suggested by you. Appreciate your guidance

I should have mentioned apitts. Appreciate your guidance in making this work. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants