-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core operator set #573
Comments
One example we can compare with is the Pytorch Edge opset. |
When we originally considered the design principles of the operator set for WebNN since even before the spec made it out of the community group, the topic of whether the core operator set should also include common high-level operations (commonly known as fusions) or just the rudimentary building block operations came up in many discussions. Without getting into a philosophical debate of what exactly, besides basic arithmetic operations, should be considered rudimentary operations, we set out to look at it objectively from the context of what already being developed in the industry at various software and hardware layers, and concluded that it is important both for practicality and performance reason to also include common fusions known to be implemented widely in the framework and the underlying platform (e.g. the operating system) layer, with an important caveat that for each fusion defined in the spec, every decomposed operations of its subgraph equivalence must also be defined. The main objective of this rule is to support the implementation that has yet to fully support a specific fusion to be able to carry on without failing. By pushing operator decomposition downward, we allow the implementation to catch up on it at a later time while simplifying the development of the framework's backend for WebNN. Also note that a keyword here is "common" fusions and not any fusion with unverifiable availability in the platform layers. For reference, we took the opportunity at that time to describe this rationale in this section of our explainer document. At the time of that writing, we used GRU and LSTM as de facto samples to describe such common fusions in the discussion. With the emergence of generative AI in the recent year, the better examples would be group-norm, layer-norm, and multi-headed attention -- operations that are widely used in both the diffusion and transformer models nowadays. |
Thanks @philloooo for soliciting input from internal Google teams and @wchao1115 for your insights and various contributions in this space. The group has produced the following documentation on this topic:
In addition, the group has received related position statements from Google in #453 and #573 (this issue), and input from an ONNX project participant on that project's approach. There may be more, but I was able to recall these. As @wchao1115 noted, this topic has been a long-term consideration and I note the topic re-emerges from time to time when new participants join. This suggests to me the group cares about this topic, and also that we could probably do better in documenting the group's current consensus position. To help transform this into action, I'd like to ask whether the group is happy with the current organization of related guidelines and documentation, or should we try to perhaps consolidate them somehow? Fold (more of) them into the specification? Is there some further investigation and research to be done to ensure we are well informed and data-driven? A review of widely deployed frameworks (expected users of this group's deliverables) with results shared in public? Regardless of where this content lives in our repo, I expect this documentation to evolve and be maintained. Everyone's feedback is welcome. |
thanks @wchao1115 for elaborating the current design philosophy! I am overall agreed with current design. Thanks @anssiko for linking to the existing resources! This issue is not trying to exclude high-level operations from the spec. It's trying to bring up couple things related to this topic:
|
We'll revive this issue with a discussion on additional primitive ops informed by MLIR Linalg, PyTorch Prims IR, TOSA, others. Pointers to relevant background research can be shared here to inform the discussion. |
I don't know if we could be also interested in: |
In talking with internal Google ML frameworks teams, one theme has come up repeatedly when discussing ML execution: the need for a predictable, core set of operators with precisely defined behavior. Without this, frameworks can't provide predictable behavior, and can't reliably express higher level concepts if they are missing from a given execution runtime. We've seen work by internal and external ML frameworks towards defining these core operator sets, and believe this concept is important for WebNN to adopt, and ideally align with any emerging standards for core op sets.
We'd like to build consensus on the following:
Follow-up work:
Related questions, but maybe out of scope for this issue:
There are some high level questions that need to be hashed out:
See also:
The text was updated successfully, but these errors were encountered: