Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Mimic 3 engine (TTS) #30

Closed
PeterBowman opened this issue Mar 8, 2023 · 5 comments
Closed

Implement Mimic 3 engine (TTS) #30

PeterBowman opened this issue Mar 8, 2023 · 5 comments
Assignees

Comments

@PeterBowman
Copy link
Member

PeterBowman commented Mar 8, 2023

@rsantos88 has found a "fast, privacy-focused, open-source, neural Text to Speech (TTS) engine" that looks great:

https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3
https://github.com/MycroftAI/mimic3
https://github.com/MycroftAI/mimic3-voices

It is lightweight, offline, and features human-like voice (as opposed to the more robotic one we currently use via eSpeak). It is written in Python and can be installed through pip (mycroft-mimic3-tts package). There are four available Spanish voices (3 male, 1 female) compiled in two datasets; the "tux" voice from the "m-ailabs" dataset sounds quite appealing.

Sample invocation:

mimic3 --voice es_ES/m-ailabs#tux "hola, me llamo teo y tengo 10 años"

(or simply --voice es_ES/m-ailabs since tux is the default voice)

Pro tip: add --cuda to enable GPU acceleration (requires the "onnxruntime-gpu" pip package).

I'm thinking of a Python client implementation of our TextToSpeech IDL service similar to speechRecognition.py.

@PeterBowman PeterBowman self-assigned this Mar 8, 2023
@PeterBowman
Copy link
Member Author

As a side note regarding installation: although the voice models should be automatically downloaded by the CLI app and stored in ${HOME}/.local/share/mycroft/mimic3/voices on first use, in my case the process got stuck every time and I had to complete it manually. I provided the link to the mimic3-voices in the previous comment. Note that there is a generator.onnx inside each voice directory that is handled by Git LFS (which weighs around 60-70 MB). It needs to be downloaded and pasted in the correct location separately.

See also https://mycroft-ai.gitbook.io/docs/mycroft-technologies/mimic-tts/mimic-3#downloading-voices and the mimic3-download command.

@PeterBowman
Copy link
Member Author

PeterBowman commented Mar 9, 2023

Already working (not fully implemented) at 7fadf45.

@rsantos88 in case you want to use this in the upcoming demos, assuming you have installed the Spanish voices, launch it with:

speechSynthesis --voice es_ES/m-ailabs --speaker tux --port /teo/tts

On the teo-self-presentation side, pass --language es_ES/m-ailabs#tux to dialogueManager and change the output port in the yarpmanager's connections tab from "/teo/tts/rpc:s" to "/speechSynthesis/rpc:s" (edit: added --port).

@PeterBowman
Copy link
Member Author

PeterBowman commented Mar 9, 2023

Done at 618cb83, see speechSynthesis.py. All IDL commands have been implemented except the pitch accessors, pause and resume. I'd consider expanding the API with volume commands and renaming "language" to "voice", which might or might not include speaker information.

There are two caveats to this implementation/engine:

  1. On certain voices, including the Spanish ones, the last letters/vowels are trimmed from the synthesised result: TTS. Last letter of the text won't be spoken in spanish MycroftAI/mimic3#30 and Last character with polish voice is always cutten MycroftAI/mimic3-voices#4. It can be worked around for now by repeating the last letter, e.g. "holaa" instead of "hola".
  2. MycroftAI has undergone staff reduction recently and seems to be on radio silence since the last blog post by their CEO back in January. For this reason, I'm forking their relevant repos into our org.

@synesthesiam
Copy link

Author of Mimic 3 here. I'm continuing my TTS work elsewhere: https://github.com/rhasspy/larynx2/

@PeterBowman
Copy link
Member Author

Author of Mimic 3 here. I'm continuing my TTS work elsewhere: https://github.com/rhasspy/larynx2/

Thank you for the heads-up and your great work! We have migrated from Mimic 3 to Piper (current name) at #33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants