Speech recognition privacy issues and solutions #99

anssiko · 2020-09-07T10:42:14Z

The Wreck a Nice Beach in the Browser: Getting the Browser to Recognize Speech talk by @kdavis-mozilla articulates the standardization struggle around the Web Speech API with focus on its speech recognition part.

My interpretation is there are two broad categories of issues for this API in terms of speech recognition:

API design issues, for example:

The current Web Speech API reflects the times in which it was originally written about 10 years ago.

In particular, it doesn't make use of the subsequent advances in, for example, the Web Audio API.

Privacy issues:

Questions of privacy that were present in the original API and new ones that arose since the original was written nip at the heels of standardization.

If speech recognition happens server side, as it does in the vast majority of cases, and your speech is retained to help train future speech recognition engines, as is now a standard in the industry, how is the GDPR right of erasure implemented?

How does the Web Speech API handle the issues of consent that arise when speech data is stored and reused server side?

The slide 10 summarizes the pros/cons of placing the speech recognition engine on the client vs. server.

It seems the industry at large is still undecided whether the speech recognition engine should sit on the client or on the server. The Web Speech API spec reflects that compromise. While the API design issues are generally easier to resolve, the privacy issues with their regulatory dimension are multifaceted and complex.

Questions:

I'm wondering whether it'd be reasonable to revisit this API design consideration in the spec:

The API itself is agnostic of the underlying speech recognition and synthesis implementation and can support both server-based and client-based/embedded recognition and synthesis.

What if users could set a preference to only allow web sites to use the speech recognition feature if they can be confident their privacy is preserved? With advances in both DNN-based models and hardware accelerators for speech recognition embedded in modern clients, a client-side engine might be a pragmatic solution to the privacy issues?

How does a modern client-side engine perform in key UX metrics (latency, quality) in comparison to widely used server-based recognition solutions?

anssiko added the User's Perspective Machine Learning Experiences on the Web: A User's Perspective label Sep 7, 2020

anssiko added this to the 2020-09-29 Live Session #4 milestone Sep 28, 2020

dontcallmedom added the Discussion topic Topic discussed at the workshop label Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech recognition privacy issues and solutions #99

Speech recognition privacy issues and solutions #99

anssiko commented Sep 7, 2020

Speech recognition privacy issues and solutions #99

Speech recognition privacy issues and solutions #99

Comments

anssiko commented Sep 7, 2020