-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Chrome Feedback on WebNN: aiming for broad device coverage and maintainability #453
Comments
Thank you @vsekhar and @inexorabletash. We discussed this feedback at WebML WG Teleconference – 24 August 2023. I encourage the WG to use this issue for general discussion and cross-link to this issue from topic-specific GH issues as appropriate. |
I observe a subset of these recommendations have been or are being discussed in topic-specific issues, currently open are e.g. #456 and #573. Furthermore, the group has focused on the models and hardware targets mentioned in this high-level summary, in both specification and prototyping efforts. I would like to revisit this high-level issue in our future meeting to see what has been done, what remains to be done, and to discuss any new information that may have come up since. And to understand whether a revision to this high-level summary would be appropriate. |
To follow up on https://www.w3.org/2024/05/16-webmachinelearning-minutes.html#t09, I'd ask @mwyrzykowski to file an issue for WebKit's standards position repo. Mike is an active WG participant, informed of both this API and Apple platforms, and as such well positioned to file the issue in a way it includes appropriate details important to Apple. @inexorabletash, I'd like to invite a Mozillian(s) to our meeting and have a discussion on this topic and familiarize them with the API. Someone from the WebGPU-land might be interested? |
As discussed at TPAC, the concerns raised in this issue now have more specific issues filed, such as #573 and #689. I expect discussions to continue on those issues. We see no need to keep this meta-issue open and would like to close this issue (someone with repo edit permissions needs to hit the button. Thanks!) |
This issue is a high-level summary of the Chrome team's feedback on WebNN, posting here for further discussion with WG members.
--
Google strongly supports the work of the Web ML WG to bring on-device ML capabilities to the open Web and we recognize the long-term contributions from many participants, in particular Intel who spearheaded the WebNN effort.
Since browser vendors will need to keep the resulting API up-to-date over many years, Google feels the proposal warrants special scrutiny to ensure the API remains relevant while imposing a manageable long-term support and maintenance cost, including after the initial WG contributors may have moved on to other projects.
To that end, several senior technical staff members on the Google Chrome team who are familiar with Web APIs, the Web standards process, and the technical implementation of various advanced browser APIs and capabilities, have carefully reviewed the WebNN proposal. This document summarizes their feedback. While we draw on the expertise of other ML research and infrastructure teams at Google (e.g. those working on TensorFlow, TensorFlow Lite, JAX, OpenXLA), we do not aim to speak for them or their projects.
Our feedback on the WebNN proposal is informed by our observation that, for new or single-vendor OS APIs or hardware accelerators, we must assume that most Web users don't have them. While we too aim to create compelling and performant experiences for users of the latest hardware and OS platforms, we have an obligation to ensure a workable experience for other users as well.
Our goal for an ML API for the Web is not to demonstrate performance with specific accelerators or drivers. Instead, Chrome's goal is to achieve 80% of a device's theoretical hardware-accelerated ML runtime performance across 80% of devices on the Web, and to do so while imposing a manageable long-term support burden on browser vendors. Users of other devices, with hardware accelerators or architectures that differ significantly from the mainstream and are not integrated by browser or OS vendors, should still benefit from workable execution of ML models on the CPU and GPU.
The ML ecosystem is still rapidly evolving, making it difficult for any API to keep up. For example, the long short-term memory (LSTM) approach to ML has already been obsoleted by Transformers, and Softmax has been succeeded by various approximate and access-efficient versions and implementations. Accelerators and hardware architectures continue to evolve as well.
Consider what would be involved in adding a new high level operator like FlashAttention to the current API. Implementers would need to connect it to each equivalent OS API operator (where it exists) or implement it as a GPU shader (when a GPU is available) or emulate it in CPU code. Current plans across the ecosystem for new models, operations, and hardware may already present an intractable roadmap for WebNN implementers who prioritize broad reach.
To address this issue, we favor adopting RISC-style tensor operations in the mathematical domain, drawing on the basics of tensor math that are unlikely to change in the near term, in contrast to the less stable higher-level CISC-style operations like HardSwish or SoftMax that are often obsoleted by new operations. The ML community is building consensus around certain low-level operator sets that work across frameworks and tool chains and we believe this work could benefit WebNN, particularly those operator sets that specifically target long-term stability.
We recognize that, in their current form, OS APIs for ML may not yet be conducive to RISC-style tensor operations. However we hope the WebNN effort will produce an API design that is performant, portable and stable, and that it will in turn have a positive influence on the evolution and long-term maintainability of OS APIs as well. We expect to evolve our own OS APIs in this way as well.
Based on the above, we recommend building on the WebNN proposal in the following ways:
LSTM
andgru
hardSwish
andsoftmax
With regards to OS- and hardware-specific optimizations, we further propose an engineering approach that clearly demonstrates the value to Web users for the ecosystem to adopt and maintain them over the long-term:
We look forward to continuing the discussion with the WG participants to deploy powerful ML capabilities across the Web's many platforms and benefitting all of the Web's users.
The text was updated successfully, but these errors were encountered: