Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Softmax axis absent #466

Closed
fdwr opened this issue Sep 29, 2023 · 2 comments · Fixed by #649
Closed

Softmax axis absent #466

fdwr opened this issue Sep 29, 2023 · 2 comments · Fixed by #649
Assignees

Comments

@fdwr
Copy link
Collaborator

fdwr commented Sep 29, 2023

(raised by @Honry in review https://github.com/microsoft/onnxruntime/pull/17665/files)

TF/PT/ONNX all take an axis parameter:

...but WebNN's softmax does not, making it challenging to implement a caller's softmax in terms of the function of the same name in WebNN. It is possible (see here) via bracketing transposes+reshapes around it, but the transpose+reshape contortions are unfortunate, and they could be more efficiently implemented in the backend rather than in each framework.

  • ✅ Apple Metal Performance Shaders softMax has an axis.
  • Apple MIL activation.softmax supports an axis.
  • ✅ DirectML's DML_ACTIVATION_SOFTMAX1_OPERATOR_DESC supports an arbitrary axis list and dimensions, just like reduce. The older DML_ACTIVATION_SOFTMAX_OPERATOR_DESC can achieve it via reshapes/transpose/strides.
  • ☑ XNNPack - limited to 2D input currently. This can be achieved by updating XNNPack to accept an axis or by using the existing XNNPack operator plus a reshape (in the simple case when the axis is the last dimension) or transpose (if the axis comes before the last dimension).

So it's achievable in each backend, even without any changes to the DML/XNNPack API's, but it would move the pain from the caller down to where it can be handled efficiently.


https://www.w3.org/TR/webnn/#api-mlgraphbuilder-softmax-method

partial interface MLGraphBuilder {
-  MLOperand softmax(MLOperand input);
+  MLOperand softmax(MLOperand input, unsigned long axis);
-  MLActivation softmax();
+  MLActivation softmax(unsigned long axis);
};
The behavior of this operation can be generically emulated from the usage of other operations as follow. However, user agents typically have a more efficient implementation for it, therefore its usage is encouraged from the performance standpoint.
// This sample deploys a well-known implementation trick [1] to compute the
// exponentials of the distances to the max value, instead of the exponentials
// of the input values itself, in order to increase the numerical stability of
// the result.
// [1]: https://cs231n.github.io/linear-classify/#softmax
- const maxX = builder.reduceMax(x, { axes: [1], keepDimensions: true });
+ const maxX = builder.reduceMax(x, { axes: [axis], keepDimensions: true });
const expX = builder.exp(builder.sub(x, maxX));
- return builder.div(expX, builder.reduceSum(expX, { axes: [1], keepDimensions: true }));
+ return builder.div(expX, builder.reduceSum(expX, { axes: [axis], keepDimensions: true }));
@huningxin
Copy link
Contributor

Should it be:

-  MLOperand softmax(MLOperand input);
+  MLOperand softmax(MLOperand input, unsigned long axis);

@fdwr
Copy link
Collaborator Author

fdwr commented Nov 2, 2023

Ningxin: Indeed, I fixed my typo right before you wrote your comment 😅.

@inexorabletash inexorabletash self-assigned this Apr 18, 2024
inexorabletash added a commit to inexorabletash/webnn that referenced this issue Apr 18, 2024
Frameworks (TensorFlow, PyTorch, ONNX) all accept an axis parameter.

Most backends also support an axis, or it can be emulated with a
reshape. As @fdwr wrote: So it's achievable in each backend... but it
would move the pain from the caller down to where it can be handled
efficiently.

Fixes webmachinelearning#466
@fdwr fdwr closed this as completed in #649 Apr 25, 2024
fdwr added a commit that referenced this issue Apr 25, 2024
* Add axis argument to softmax()

Frameworks (TensorFlow, PyTorch, ONNX) all accept an axis parameter.

Most backends also support an axis, or it can be emulated with a
reshape. As @fdwr wrote: So it's achievable in each backend... but it
would move the pain from the caller down to where it can be handled
efficiently.

Fixes #466

* revert activation example to softmax

* validate softmax axis against inputs rank

* update TOC headers

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* camelCase not snake_case

* Remove unnecessary condition

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Update index.bs

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>

* Sketch of validation for activations

* For gru() and lstm(), calculate gate descriptor, validate activations with it

* fix some copy/pasta

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants