[Feature request] Plans to introduce support for SSML? #670

zubairahmed-ai · 2021-07-20T09:30:54Z

Do you have any plans to introduce Speech Synthesis Markup Language so that output can be customized?

Tanmay-V22315 · 2021-07-22T06:24:48Z

I second this. I also have a few recommendations of my own and hacks that I have thought of that I would like to present on top of this if anybody wants to implement.
These are what I have thought of as of now:

Introduce some zeroes in the list of floats formed after inference (after calling synthesizer.tts()) based on position of a tag or something within the string (like <pause: 10 ms> or something). Let's say that we want the TTS model to pause for 2 seconds. For that, we can get the sample rate from the config file for the model, multiply the sample rate with the time to pause in seconds and then generate a list of zeroes, the number of zeroes being the result of the multiplication.
For wedging an audio file in between, we can use something like pydub to "import", so to speak, the audio file and convert it into a list which we can then append into the list of floats formed after inference based on a tag and its location that is within the string.
For the alias tag from SSML, we can have a tag like <alias["W3C", "World Wide Web Consortium"]> or maybe <alias{"W3C":"World Wide Web Consortium"}>. Once identifying the tag, we can isolate it, remove alias from the isolated string and then use the eval() function (which is built into python) to convert the string into a list/dictionary
For example:

# perhaps we can have alias declaration in the beginning of the string. 
# We can't use double quotes in the dictionary within the string
 


inputstring = " <alias>{'W3C':'World Wide Web Consortium', 'W3': 'World Wide Web', 'ISO': 'International standards organisation'}</alias> The W3C is the main ISO for the W3C. Founded in 1994 and currently led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the W3."


#Slicing the string to obtain the dict
aliasdict = eval(inputstring[inputstring.index("<alias>")+len("<alias")+1:inputstring.index("</alias>")])

# SAME AS:

# Substring1 = "<alias>"
# Substring2 = "</alias>"

# Index1 = inputstring.index(Substring1)
# Index2 = inputstring.index(Substring2)

# aliasdict = eval(inputstring[Index1+len(Substring1):Index2])


# get rid of the dictionary in the input string 
inputstring = inputstring[inputstring.index("</alias>")+len("</alias>")+1:]

for i in aliasdict.keys():
    inputstring = inputstring.replace(i,aliasdict[i])

print(inputstring)

# perhaps a much better and easier way to do this would be take a dictionary as a parameter for the input text post-processing function

Edit it and chuck this stuff into a function and you're (hopefully) good to go.

For time (like 4:30 PM ) we can make use of strftime() (See more here)
For spell-it, we can split the words enclosed in a tag like: <spell-it> abcdefgh </spell-it> in such a way that it outputs a b c d e f g h (perhaps by using " ".join(list(inputstring)))
For speaking speed and emphasis, I'm not sure but maybe we can mess with the sample rate temporarily or something like that.
expletive is rather simple, we can append a uniform list of values (a list with same numbers) based on position of tag and split the sentence where the expletive tag appears and remove that expletive phrase/word
Perhaps we can have a repeat tag which repeats the phrase/words between <repeat=(an integer)> and </repeat> where the integer is the number of times the phrase/word is to be repeated

These are the ones that I can think of as of now
I am a novice/intermediate in terms of programming in general so take these "advices" with a grain of salt ig but I thought that the OPs feature request was pretty cool so I decided to chime in.

stale · 2021-09-22T16:30:02Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

synesthesiam · 2021-09-22T17:29:17Z

See #752 for progress with SSML.

zubairahmed-ai added the feature request feature requests for making TTS better. label Jul 20, 2021

stale bot added the wontfix This will not be worked on but feel free to help. label Aug 21, 2021

erogol removed the wontfix This will not be worked on but feel free to help. label Aug 23, 2021

coqui-ai deleted a comment from stale bot Aug 23, 2021

stale bot added the wontfix This will not be worked on but feel free to help. label Sep 22, 2021

stale bot removed the wontfix This will not be worked on but feel free to help. label Sep 22, 2021

stale bot added the wontfix This will not be worked on but feel free to help. label Oct 22, 2021

coqui-ai deleted a comment from stale bot Oct 25, 2021

stale bot removed the wontfix This will not be worked on but feel free to help. label Oct 25, 2021

stale bot added the wontfix This will not be worked on but feel free to help. label Nov 24, 2021

coqui-ai deleted a comment from stale bot Nov 24, 2021

stale bot removed the wontfix This will not be worked on but feel free to help. label Nov 24, 2021

zubairahmed-ai closed this as completed Nov 25, 2021

Th3rdSergeevich mentioned this issue Oct 7, 2023

[Feature request] [SSML] Manual Stress Control #3039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Plans to introduce support for SSML? #670

[Feature request] Plans to introduce support for SSML? #670

zubairahmed-ai commented Jul 20, 2021

Tanmay-V22315 commented Jul 22, 2021 •

edited

Loading

stale bot commented Sep 22, 2021

synesthesiam commented Sep 22, 2021

[Feature request] Plans to introduce support for SSML? #670

[Feature request] Plans to introduce support for SSML? #670

Comments

zubairahmed-ai commented Jul 20, 2021

Tanmay-V22315 commented Jul 22, 2021 • edited Loading

stale bot commented Sep 22, 2021

synesthesiam commented Sep 22, 2021

Tanmay-V22315 commented Jul 22, 2021 •

edited

Loading