Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speech recognition privacy issues and solutions #99

Open
anssiko opened this issue Sep 7, 2020 · 0 comments
Open

Speech recognition privacy issues and solutions #99

anssiko opened this issue Sep 7, 2020 · 0 comments
Labels
Discussion topic Topic discussed at the workshop User's Perspective Machine Learning Experiences on the Web: A User's Perspective

Comments

@anssiko
Copy link
Member

anssiko commented Sep 7, 2020

The Wreck a Nice Beach in the Browser: Getting the Browser to Recognize Speech talk by @kdavis-mozilla articulates the standardization struggle around the Web Speech API with focus on its speech recognition part.

My interpretation is there are two broad categories of issues for this API in terms of speech recognition:

  1. API design issues, for example:

The current Web Speech API reflects the times in which it was originally written about 10 years ago.

In particular, it doesn't make use of the subsequent advances in, for example, the Web Audio API.

  1. Privacy issues:

Questions of privacy that were present in the original API and new ones that arose since the original was written nip at the heels of standardization.

If speech recognition happens server side, as it does in the vast majority of cases, and your speech is retained to help train future speech recognition engines, as is now a standard in the industry, how is the GDPR right of erasure implemented?

How does the Web Speech API handle the issues of consent that arise when speech data is stored and reused server side?

The slide 10 summarizes the pros/cons of placing the speech recognition engine on the client vs. server.

It seems the industry at large is still undecided whether the speech recognition engine should sit on the client or on the server. The Web Speech API spec reflects that compromise. While the API design issues are generally easier to resolve, the privacy issues with their regulatory dimension are multifaceted and complex.

Questions:

The API itself is agnostic of the underlying speech recognition and synthesis implementation and can support both server-based and client-based/embedded recognition and synthesis.

What if users could set a preference to only allow web sites to use the speech recognition feature if they can be confident their privacy is preserved? With advances in both DNN-based models and hardware accelerators for speech recognition embedded in modern clients, a client-side engine might be a pragmatic solution to the privacy issues?

How does a modern client-side engine perform in key UX metrics (latency, quality) in comparison to widely used server-based recognition solutions?

@anssiko anssiko added the User's Perspective Machine Learning Experiences on the Web: A User's Perspective label Sep 7, 2020
@anssiko anssiko added this to the 2020-09-29 Live Session #4 milestone Sep 28, 2020
@dontcallmedom dontcallmedom added the Discussion topic Topic discussed at the workshop label Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion topic Topic discussed at the workshop User's Perspective Machine Learning Experiences on the Web: A User's Perspective
Projects
None yet
Development

No branches or pull requests

2 participants