Unifies access to multiple open source text to speech systems and voices for many languages.
Supports a subset of SSML that can use multiple voices and text to speech systems!
cache_dir
- Directory to cache generated WAV files
- Leave empty to disable (default, HA already has a TTS cache)
debug
- If true, DEBUG messages are printed to the log
- Default is false
larynx_quality
- Default quality setting for the Larynx TTS system
- Default is "high", choices are "high", "medium", "low" (use "low" for Raspberry Pi)
larynx_denoiser_strength
- Amount to apply denoiser during Larynx TTS post-processing
- Default is 0.005 (higher value reduces noise, but distorts voice)
larynx_noise_scale
- Volatility of Larynx TTS vocalization
- Default is 0.667, range is 0-1. Higher values make the voice less monotone
larynx_length_scale
- Speed of Larynx TTS speech
- Default is 1.0, lower values are faster, higher values are slower
Use OpenTTS as a drop-in replacement for MaryTTS.
Add to your configuration.yaml
file:
tts:
- platform: marytts
port: 5500
voice:larynx:harvard
The voice
format is <TTS_SYSTEM>:<VOICE_NAME>
. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here.
You may leave out the port
setting if you configure the OpenTTS host port to be 59125 instead of 5500.
If your input text begins with a left angle bracket (<
), it will be interpreted as SSML.
A subset of SSML is supported:
<speak>
- wrap around SSML textlang
- set language for document
<s>
- sentence (disables automatic sentence breaking)lang
- set language for sentence
<w>
/<token>
- word (disables automatic tokenization)<voice name="...">
- set voice of inner textvoice
- name or language of voice- Name format is
tts:voice
(e.g., "glow-speak:en-us_mary_ann") ortts:voice#speaker_id
(e.g., "coqui-tts:en_vctk#p228") - If one of the supported languages, a preferred voice is used (override with
--preferred-voice <lang> <voice>
)
- Name format is
<say-as interpret-as="">
- force interpretation of inner textinterpret-as
one of "spell-out", "date", "number", "time", or "currency"format
- way to format text depending oninterpret-as
- number - one of "cardinal", "ordinal", "digits", "year"
- date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
<break time="">
- Pause for given amount of time- time - seconds ("123s") or milliseconds ("123ms")
<sub alias="">
- substitutealias
for inner text
Below is a list of the supported TTS systems and voice counts by language.
- Larynx
- Glow-Speak
- Coqui-TTS
- English (110), Japanese (1), Chinese (1)
- Patched embedded version of Coqui-TTS 0.3.1
- nanoTTS
- English (2), German (1), French (1), Italian (1), Spanish (1)
- MaryTTS
- English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
- Includes embedded MaryTTS
- flite
- English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3)
- Festival
- English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2)
- Spanish/Catalan/Finnish use ISO-8859-15 encoding
- Czech uses ISO-8859-2 encoding
- Russian is transliterated from Cyrillic to Latin script automatically
- Arabic uses UTF-8 and is diacritized with mishkal
- eSpeak
- Supports huge number of languages/locales, but sounds robotic
On the Raspberry Pi, you may need to lower the quality of Larynx and Glow-Speak voices to get reasonable response times.
This can by done with the larynx_quality
setting above (use "medium" or "low"), or by appending the vocoder name to the end of your voice:
tts:
- platform: marytts
voice:larynx:harvard;low
Available quality levels are "high", "medium", and "low".
Note that this only applies to Larynx and Glow-Speak voices.