-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider removing MLActivation
parameters used for op fusion
#658
Comments
Note that ops are still being fused, unbeknownst to the web platform layer! Spec discussions are happening here: webmachinelearning/webnn#658 Bug: TODO Change-Id: I8a4f2ab708e34087eeb10421596243b17bc8d7e2
Ops can still be fused, unbeknownst to the web platform layer! Spec discussions are happening here: webmachinelearning/webnn#658 Bug: TODO Change-Id: I8a4f2ab708e34087eeb10421596243b17bc8d7e2
Ops can still be fused, unbeknownst to the web platform layer! Spec discussions are happening here: webmachinelearning/webnn#658 Bug: TODO Change-Id: I8a4f2ab708e34087eeb10421596243b17bc8d7e2
🤔 Reviewing so many code reviews the past year, these fields...
...have indeed shown to be problematic in the following aspects:
This added complexity is for performance, and that performance is worthwhile, but of the currently implemented backends, the one that benefits most from these fusions is the DirectML one in Chromium, and this backend already does a one operator look-ahead anyway for simple adjacent fusions (like Conv+Relu) which means that any WebNN caller can call either {conv, relu} or {conv+relu} and get the same performance benefit - no fusion logic required from the caller at the WebNN API level. So, I support removing FYI @mingmingtasd. (I also want to check with Chai and Rafael too for their opinions...) |
Thanks for the proposal @a-sully and discussion @fdwr ! I think it is good to leave the activation functions fusion to underlying graph optimizer (either in Chromium backends, such as DirectML backend in Chromium, or native ML frameworks, such as XNNPACK's xnn_subgraph_fusion()). That would make the WebNN API clearer. Previously when WebNN is used as a target backend of frameworks, such as ONNXRuntime Web and TFLite Wasm runtime, WebNN is likely given a convolution with fused activation function thanks to the framework's graph optimizer. A WebNN backend could leverage that and map to native convolution operator supporting fused activation function directly. Now the WebNN backend should ensure whether the native framework can fuse activation functions. If the native framework doesn't fuse, WebNN backend should add an optimization pass and fuse by itself in order to avoid the performance regression. Would it be worth adding an implementation note? |
Thanks, @huningxin. I didn't mention anything about op fusion in #664 because there's already this text in the spec:
If you have more suggestions, please comment on that PR! :) |
That one is good, but it is more about the graph compilation time optimization that is usually done by native ML APIs. I think we won't want to miss the graph construction time optimization opportunity when it is natively supported, as the spec text mentions for previous
DirectML backend in Chromium exploits this op fusion opportunity when constructing the DirectML graph.
Will do! |
@fdwr Correct! Such as the softmax with axis, if the base operator's output shape is unknown, we can't fuse them. |
MLActivation
currently has two uses:batchNormalization
andconv2d
, as a hint that this activation may be fusable with the given operatorlstm
andgru
, as activations applied to gates within the given recurrent unitI have some thoughts on (2), but I'll leave that to another issue. Let's focus on (1) here :)
My claim is that there's no benefit to passing
MLActivation
s as parameters toMLOperand
s for the purpose of op fusion. Here's why:MLActivation
s from their respectiveMLOperand
if the combo is not supported for the given version of DMLMLActivation
in the WebNN spec, it may in fact be fusible with its input or outputDML_OPERATOR_ELEMENT_WISE_ADD1
andDML_OPERATOR_GEMM
, which ~map to WebNN'sadd
andgemm
operatorsWhat this means in practice is:
MLOperand
and itsMLActivation
into what's effectively just twoMLOperand
s connected to each other (as Chromium's CoreML backend currently does):MLActivation
parameters in WebNN (as Chromium's DML backend currently does)Whether a given operator can be fused with a given activation is a very backend-specific quirk. Presumably we don't want to plumb through a new
MLActivation
operator to the web for every operator which any backend decides it can now fuse with some activation! This seems best left as an implementation detail either by the user agent (as described above) or the framework (who knows how much op fusion is happening in Core ML under the hood?! 🤔)Thoughts?
The text was updated successfully, but these errors were encountered: