-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should softmax be a fused activation ? #471
Comments
It looks like a documentation issue, because the actual code accepts it. Will follow up with team and doc writer... [update] It was added in DML 1.12. So for the older DML target, we could always issue a separate Softmax call upon seeing that activation. |
Given the above - implementations can either (1) only support backends with support for fusing softmax (e.g. by bundling a backend, refusing to run on older backend, etc) or (2) implement support in user-land - I think we can close this out. Do you agree @fdwr ? |
@inexorabletash Yes, in any case, this is closeable because the backend can choose to execute it separately. I don't think it's good to croak on the model (your option 1), as backends should instead gracefully execute the convolution and the |
Should we add a note into the spec mentioning this fallback path? |
Purely as a thought experiment: Softmax requires summing over a dimension ( Other activations defined in the spec are element-wise ( Just curious, how could a fuzed softmax improve performance (or reduce memory consumption)? How is a fused softmax activation different from syntax sugar of "Op + Softmax"? Intuitively, the entire pre-softmax output (i.e. the intermediate result) needs to be computed before softmax can produce the result (because of the dependency on So it seems the intermediate result is unavoidable (as opposed to being inlined for activations such as ReLU), negating the benefit of a fuzed activation. |
@wacky6: Softmax is also normalization. I tried stuffing ML ops into a few tidy categories here of my own arbitrary grouping, but then I realized they overlap (e.g. softmax in the table).
|
Related: #658
@wacky6: I haven't seen the algorithm firsthand, but I recall there's a clever optimization where GPU/NPU drivers can factor out the reduction of softmax's denominator directly into the convolution pass, without needing a separate resummation.
@huningxin: Alternately if we delete the 3 |
Agreed, I propose we close this issue once #664 merges |
SGTM! |
#664 merged 🎉 |
Closing as the |
(raised by @junwei in Chromium CL review https://chromium-review.googlesource.com/c/chromium/src/+/4920337/comment/6ed907ff_2f86a723/
WebNN's softmax can be an MLActivation.
While the DirectML does not support fusing softmax according to the doc: https://learn.microsoft.com/en-us/windows/ai/directml/dml-fused-activations
/cc @fdwr @huningxin
The text was updated successfully, but these errors were encountered: