Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core operator set #573

Open
philloooo opened this issue Feb 16, 2024 · 6 comments
Open

Core operator set #573

philloooo opened this issue Feb 16, 2024 · 6 comments

Comments

@philloooo
Copy link
Contributor

In talking with internal Google ML frameworks teams, one theme has come up repeatedly when discussing ML execution: the need for a predictable, core set of operators with precisely defined behavior. Without this, frameworks can't provide predictable behavior, and can't reliably express higher level concepts if they are missing from a given execution runtime. We've seen work by internal and external ML frameworks towards defining these core operator sets, and believe this concept is important for WebNN to adopt, and ideally align with any emerging standards for core op sets.

We'd like to build consensus on the following:

  • WebNN should define a core op set, which focuses on the low level ops that are indecomposable and captures functional completeness of the API.
  • Implementations of WebNN MUST (in the RFC 2119 sense) implement this core op set.
  • The behavior of these core ops will be specified precisely with conformance tests in WPT.
  • We must validate with multiple ML frameworks that the identified core op set meets their needs.

Follow-up work:

  • Actually define the core op set - both the list of ops and their behavior
  • Have at least 2 implementations to make sure the interface including constraints specified can be supported by multiple platforms
  • Come up with a rubric for how rigorously the core op set is limited
    • E.g. Would we include both sin() and cos(), even though you can define one in terms of the other? Do we only need nand() because then you can make and/or/no/xor ?
  • Determine if a subset of a "standard" core op set is acceptable for v1 (i.e. do we need control flow Control flow operations: if, while #559 and bitwise operators Bitwise operators and logical operators naming (rename not to logicalNot) #496 ?)
  • Define core op set standardization / evolution over time (e.g. in conjunction with frameworks)


Related questions, but maybe out of scope for this issue:

  • What do we call non-core ops? (Composite? High-level? …)
  • Should all non-core ops be defined in terms of these core ops?
  • How should we structure the spec to make core vs non-core ops clear?
  • How precisely should the behavior of non-core ops be constrained?

There are some high level questions that need to be hashed out:

  • Similar to GPUSupportedLimits, whether/how to expose limits that are backend specific? This probably needs more implementation experience before answering.

See also:

@philloooo
Copy link
Contributor Author

One example we can compare with is the Pytorch Edge opset.

@wchao1115
Copy link
Collaborator

wchao1115 commented Feb 27, 2024

When we originally considered the design principles of the operator set for WebNN since even before the spec made it out of the community group, the topic of whether the core operator set should also include common high-level operations (commonly known as fusions) or just the rudimentary building block operations came up in many discussions.

Without getting into a philosophical debate of what exactly, besides basic arithmetic operations, should be considered rudimentary operations, we set out to look at it objectively from the context of what already being developed in the industry at various software and hardware layers, and concluded that it is important both for practicality and performance reason to also include common fusions known to be implemented widely in the framework and the underlying platform (e.g. the operating system) layer, with an important caveat that for each fusion defined in the spec, every decomposed operations of its subgraph equivalence must also be defined.

The main objective of this rule is to support the implementation that has yet to fully support a specific fusion to be able to carry on without failing. By pushing operator decomposition downward, we allow the implementation to catch up on it at a later time while simplifying the development of the framework's backend for WebNN. Also note that a keyword here is "common" fusions and not any fusion with unverifiable availability in the platform layers.

For reference, we took the opportunity at that time to describe this rationale in this section of our explainer document.

At the time of that writing, we used GRU and LSTM as de facto samples to describe such common fusions in the discussion. With the emergence of generative AI in the recent year, the better examples would be group-norm, layer-norm, and multi-headed attention -- operations that are widely used in both the diffusion and transformer models nowadays.

@anssiko
Copy link
Member

anssiko commented Feb 27, 2024

Thanks @philloooo for soliciting input from internal Google teams and @wchao1115 for your insights and various contributions in this space.

The group has produced the following documentation on this topic:

In addition, the group has received related position statements from Google in #453 and #573 (this issue), and input from an ONNX project participant on that project's approach. There may be more, but I was able to recall these.

As @wchao1115 noted, this topic has been a long-term consideration and I note the topic re-emerges from time to time when new participants join. This suggests to me the group cares about this topic, and also that we could probably do better in documenting the group's current consensus position.

To help transform this into action, I'd like to ask whether the group is happy with the current organization of related guidelines and documentation, or should we try to perhaps consolidate them somehow? Fold (more of) them into the specification? Is there some further investigation and research to be done to ensure we are well informed and data-driven? A review of widely deployed frameworks (expected users of this group's deliverables) with results shared in public?

Regardless of where this content lives in our repo, I expect this documentation to evolve and be maintained. Everyone's feedback is welcome.

@philloooo
Copy link
Contributor Author

philloooo commented Mar 7, 2024

thanks @wchao1115 for elaborating the current design philosophy! I am overall agreed with current design. Thanks @anssiko for linking to the existing resources!

This issue is not trying to exclude high-level operations from the spec. It's trying to bring up couple things related to this topic:

  • Currently comparing with an op set like pytorch edge set, webnn still has some core ops missing. We are also trying to work with internal teams to see if we can share something similar for StableHLO. So I created this issue to track the status of how "complete" is our core op set.
  • How we present the core op set with other high-level/composite ops in the spec is up to discussion. We can present as is now, so they are differentiated by whether there are decomposition defined. Or we put them to different sections, or we just add annotations for core ops.
  • We should ensure the core op set to behave the same across backends. The high level/composite ops are easier to diverge between backends though. Take lstml as example, the supported activations are different across backends. We need a solution to better support composite ops.

@anssiko
Copy link
Member

anssiko commented Nov 7, 2024

We'll revive this issue with a discussion on additional primitive ops informed by MLIR Linalg, PyTorch Prims IR, TOSA, others. Pointers to relevant background research can be shared here to inform the discussion.

@bhack
Copy link

bhack commented Nov 7, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants