Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Plans to introduce support for SSML? #670

Closed
zubairahmed-ai opened this issue Jul 20, 2021 · 3 comments
Closed

[Feature request] Plans to introduce support for SSML? #670

zubairahmed-ai opened this issue Jul 20, 2021 · 3 comments
Labels
feature request feature requests for making TTS better.

Comments

@zubairahmed-ai
Copy link

Do you have any plans to introduce Speech Synthesis Markup Language so that output can be customized?

@zubairahmed-ai zubairahmed-ai added the feature request feature requests for making TTS better. label Jul 20, 2021
@Tanmay-V22315
Copy link

Tanmay-V22315 commented Jul 22, 2021

I second this. I also have a few recommendations of my own and hacks that I have thought of that I would like to present on top of this if anybody wants to implement.
These are what I have thought of as of now:

  • Introduce some zeroes in the list of floats formed after inference (after calling synthesizer.tts()) based on position of a tag or something within the string (like <pause: 10 ms> or something). Let's say that we want the TTS model to pause for 2 seconds. For that, we can get the sample rate from the config file for the model, multiply the sample rate with the time to pause in seconds and then generate a list of zeroes, the number of zeroes being the result of the multiplication.

  • For wedging an audio file in between, we can use something like pydub to "import", so to speak, the audio file and convert it into a list which we can then append into the list of floats formed after inference based on a tag and its location that is within the string.

  • For the alias tag from SSML, we can have a tag like <alias["W3C", "World Wide Web Consortium"]> or maybe <alias{"W3C":"World Wide Web Consortium"}>. Once identifying the tag, we can isolate it, remove alias from the isolated string and then use the eval() function (which is built into python) to convert the string into a list/dictionary
    For example:

# perhaps we can have alias declaration in the beginning of the string. 
# We can't use double quotes in the dictionary within the string
 


inputstring = " <alias>{'W3C':'World Wide Web Consortium', 'W3': 'World Wide Web', 'ISO': 'International standards organisation'}</alias> The W3C is the main ISO for the W3C. Founded in 1994 and currently led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in the development of standards for the W3."


#Slicing the string to obtain the dict
aliasdict = eval(inputstring[inputstring.index("<alias>")+len("<alias")+1:inputstring.index("</alias>")])

# SAME AS:

# Substring1 = "<alias>"
# Substring2 = "</alias>"

# Index1 = inputstring.index(Substring1)
# Index2 = inputstring.index(Substring2)

# aliasdict = eval(inputstring[Index1+len(Substring1):Index2])


# get rid of the dictionary in the input string 
inputstring = inputstring[inputstring.index("</alias>")+len("</alias>")+1:]

for i in aliasdict.keys():
    inputstring = inputstring.replace(i,aliasdict[i])

print(inputstring)

# perhaps a much better and easier way to do this would be take a dictionary as a parameter for the input text post-processing function

Edit it and chuck this stuff into a function and you're (hopefully) good to go.

  • For time (like 4:30 PM ) we can make use of strftime() (See more here)
  • For spell-it, we can split the words enclosed in a tag like: <spell-it> abcdefgh </spell-it> in such a way that it outputs a b c d e f g h (perhaps by using " ".join(list(inputstring)))
  • For speaking speed and emphasis, I'm not sure but maybe we can mess with the sample rate temporarily or something like that.
  • expletive is rather simple, we can append a uniform list of values (a list with same numbers) based on position of tag and split the sentence where the expletive tag appears and remove that expletive phrase/word
  • Perhaps we can have a repeat tag which repeats the phrase/words between <repeat=(an integer)> and </repeat> where the integer is the number of times the phrase/word is to be repeated

These are the ones that I can think of as of now
I am a novice/intermediate in terms of programming in general so take these "advices" with a grain of salt ig but I thought that the OPs feature request was pretty cool so I decided to chime in.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Aug 21, 2021
@erogol erogol removed the wontfix This will not be worked on but feel free to help. label Aug 23, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Aug 23, 2021
@stale
Copy link

stale bot commented Sep 22, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Sep 22, 2021
@synesthesiam
Copy link
Contributor

See #752 for progress with SSML.

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Sep 22, 2021
@stale stale bot added the wontfix This will not be worked on but feel free to help. label Oct 22, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Oct 25, 2021
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Oct 25, 2021
@stale stale bot added the wontfix This will not be worked on but feel free to help. label Nov 24, 2021
@coqui-ai coqui-ai deleted a comment from stale bot Nov 24, 2021
@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better.
Projects
None yet
Development

No branches or pull requests

4 participants