Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to microsoft-cognitiveservices-speech-sdk for SpeechSynthesis #173

Open
sbiaudet opened this issue Feb 9, 2022 · 5 comments
Open

Comments

@sbiaudet
Copy link

sbiaudet commented Feb 9, 2022

Hello Compulim,

I use through the botframework-webchat, the module web-speech-cognitive-services. I have a fully customized UX with an animated character. I need the onwordboundary event to synchronize the display of subtitles and the character animations.

Currently you are using the REST API to do the SpeechSynthesis. If you used the sdk directly for SpeechSynthesis with websocket we could have the onwordboundary event. Also we could have access to the Visem event to do lipsync.

Do you think you could use the sdk instead of the REST API?

@sbiaudet
Copy link
Author

@compulim we've been working on integrating the SpeechSynthesis library's websocket-based sdk in place of REST calls.

Response times are much better and the onStart event is better synchronized. We've added support for the onBoundary event with word and viseme types

Are you open to us proposing a pull-request?

@sbiaudet
Copy link
Author

sbiaudet commented Jul 3, 2023

@compulim little up to remind my demand. We are ready to push a pull-request. Are you ok ?

@vladmaraev
Copy link

@sbiaudet I would be very interested in this. Do you want to collaborate on this change?

@sbiaudet
Copy link
Author

@vladmaraev I never had a response from @compulim. We've fork the repo and publish a package here https://www.npmjs.com/package/@davi-ai/web-speech-cognitive-services-davi.

I'm ready to merge here, it's idiot to maintain a fork just for this. @compulim, is that ok with you ?

@vladmaraev
Copy link

@sbiaudet That's really nice! I tried your package, but unfortunately it fails to synthesise SSML (however I can see in the generated js code that SSML is still supported)... Maybe there are some caveats? I would be happy to contribute to either a PR here or to your fork (is it public?). Many thanks for your work!

vladmaraev added a commit to vladmaraev/speechstate that referenced this issue Jun 28, 2024
This change switches TTS ponyfill to @Davi-ai fork. See the reasons
for creating the fork here:
compulim/web-speech-cognitive-services#173 (comment)

This change enables sending VISEME events by SpeechState, to control
external avatars.

In addition, @Davi-ai fork was patched to remove excessive logging.
vladmaraev added a commit to vladmaraev/speechstate that referenced this issue Aug 13, 2024
This change switches TTS ponyfill to @Davi-ai fork. See the reasons
for creating the fork here:
compulim/web-speech-cognitive-services#173 (comment)

This change enables sending VISEME events by SpeechState, to control
external avatars.

In addition, @Davi-ai fork was patched to remove excessive logging.
vladmaraev added a commit to vladmaraev/speechstate that referenced this issue Oct 24, 2024
* This change switches TTS ponyfill to @Davi-ai fork. See the reasons for creating the fork is here: compulim/web-speech-cognitive-services#173 (comment)

* The @Davi-ai fork was modified to adjust typing, ASR final results and, additionally, excessive logging was removed.

* This change enables sending VISEME events inside SpeechState, to control external avatars. For now, the VISEME events gets transformed to stream of FURHAT_BLENDSHAPES events which control Furhat lipsync.

* Extensive test coverage for ASR and TTS (including streaming). To test streaming one needs to run SSE server (~test/server.js~)
vladmaraev added a commit to vladmaraev/speechstate that referenced this issue Oct 24, 2024
* This change switches TTS ponyfill to @Davi-ai fork. See the reasons for creating the fork is here: compulim/web-speech-cognitive-services#173 (comment)

* The @Davi-ai fork was modified to adjust typing, ASR final results and, additionally, excessive logging was removed.

* This change enables sending VISEME events inside SpeechState, to control external avatars. For now, the VISEME events gets transformed to stream of FURHAT_BLENDSHAPES events which control Furhat lipsync.

* Extensive test coverage for ASR and TTS (including streaming). To test streaming one needs to run SSE server (~test/server.js~)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants