Skip to content

Files

Latest commit

 

History

History

opentts-es

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

OpenTTS (es)

Unifies access to multiple open source text to speech systems and voices for many languages.

Supports a subset of SSML that can use multiple voices and text to speech systems!

Listen to voice samples

View source code

Settings

  • cache_dir
    • Directory to cache generated WAV files
    • Leave empty to disable (default, HA already has a TTS cache)
  • debug
    • If true, DEBUG messages are printed to the log
    • Default is false
  • larynx_quality
    • Default quality setting for the Larynx TTS system
    • Default is "high", choices are "high", "medium", "low" (use "low" for Raspberry Pi)
  • larynx_denoiser_strength
    • Amount to apply denoiser during Larynx TTS post-processing
    • Default is 0.005 (higher value reduces noise, but distorts voice)
  • larynx_noise_scale
    • Volatility of Larynx TTS vocalization
    • Default is 0.667, range is 0-1. Higher values make the voice less monotone
  • larynx_length_scale
    • Speed of Larynx TTS speech
    • Default is 1.0, lower values are faster, higher values are slower

MaryTTS Compatible Endpoint

Use OpenTTS as a drop-in replacement for MaryTTS.

Add to your configuration.yaml file:

tts:
  - platform: marytts
    port: 5500
    voice:larynx:harvard

The voice format is <TTS_SYSTEM>:<VOICE_NAME>. Visit the OpenTTS web UI and copy/paste the "voice id" of your favorite voice here.

You may leave out the port setting if you configure the OpenTTS host port to be 59125 instead of 5500.

If your input text begins with a left angle bracket (<), it will be interpreted as SSML.

SSML

A subset of SSML is supported:

  • <speak> - wrap around SSML text
    • lang - set language for document
  • <s> - sentence (disables automatic sentence breaking)
    • lang - set language for sentence
  • <w> / <token> - word (disables automatic tokenization)
  • <voice name="..."> - set voice of inner text
    • voice - name or language of voice
      • Name format is tts:voice (e.g., "glow-speak:en-us_mary_ann") or tts:voice#speaker_id (e.g., "coqui-tts:en_vctk#p228")
      • If one of the supported languages, a preferred voice is used (override with --preferred-voice <lang> <voice>)
  • <say-as interpret-as=""> - force interpretation of inner text
    • interpret-as one of "spell-out", "date", "number", "time", or "currency"
    • format - way to format text depending on interpret-as
      • number - one of "cardinal", "ordinal", "digits", "year"
      • date - string with "d" (cardinal day), "o" (ordinal day), "m" (month), or "y" (year)
  • <break time=""> - Pause for given amount of time
    • time - seconds ("123s") or milliseconds ("123ms")
  • <sub alias=""> - substitute alias for inner text

Supported Text to Speech Systems

Below is a list of the supported TTS systems and voice counts by language.

  • Larynx
    • English (27), German (7), French (3), Spanish (2), Dutch (4), Russian (3), Swedish (1), Italian (2), Swahili (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
    • Patched embedded version of Larynx 1.0
  • Glow-Speak
    • English (2), German (1), French (1), Spanish (1), Dutch (1), Russian (1), Swedish (1), Italian (1), Swahili (1), Greek (1), Finnish (1), Hungarian (1), Korean (1)
    • Model types available: GlowTTS
    • Vocoders available: HiFi-Gan (3 levels of quality)
  • Coqui-TTS
    • English (110), Japanese (1), Chinese (1)
    • Patched embedded version of Coqui-TTS 0.3.1
  • nanoTTS
    • English (2), German (1), French (1), Italian (1), Spanish (1)
  • MaryTTS
    • English (7), German (3), French (4), Italian (1), Russian (1), Swedish (1), Telugu (1), Turkish (1)
    • Includes embedded MaryTTS
  • flite
    • English (19), Hindi (1), Bengali (1), Gujarati (3), Kannada (1), Marathi (2), Punjabi (1), Tamil (1), Telugu (3)
  • Festival
    • English (9), Spanish (1), Catalan (1), Czech (4), Russian (1), Finnish (2), Marathi (1), Telugu (1), Hindi (1), Italian (2), Arabic (2)
    • Spanish/Catalan/Finnish use ISO-8859-15 encoding
    • Czech uses ISO-8859-2 encoding
    • Russian is transliterated from Cyrillic to Latin script automatically
    • Arabic uses UTF-8 and is diacritized with mishkal
  • eSpeak
    • Supports huge number of languages/locales, but sounds robotic

Voice Quality

On the Raspberry Pi, you may need to lower the quality of Larynx and Glow-Speak voices to get reasonable response times.

This can by done with the larynx_quality setting above (use "medium" or "low"), or by appending the vocoder name to the end of your voice:

tts:
  - platform: marytts
    voice:larynx:harvard;low

Available quality levels are "high", "medium", and "low".

Note that this only applies to Larynx and Glow-Speak voices.