Should softmax be a fused activation ? #471

mingmingtasd · 2023-10-19T07:20:44Z

(raised by @junwei in Chromium CL review https://chromium-review.googlesource.com/c/chromium/src/+/4920337/comment/6ed907ff_2f86a723/

WebNN's softmax can be an MLActivation.

While the DirectML does not support fusing softmax according to the doc: https://learn.microsoft.com/en-us/windows/ai/directml/dml-fused-activations

/cc @fdwr @huningxin

fdwr · 2023-10-19T16:26:58Z

While the DirectML does not support fusing softmax according to the doc

It looks like a documentation issue, because the actual code accepts it. Will follow up with team and doc writer...

[update] It was added in DML 1.12. So for the older DML target, we could always issue a separate Softmax call upon seeing that activation.

inexorabletash · 2024-04-19T00:17:36Z

Given the above - implementations can either (1) only support backends with support for fusing softmax (e.g. by bundling a backend, refusing to run on older backend, etc) or (2) implement support in user-land - I think we can close this out.

Do you agree @fdwr ?

fdwr · 2024-04-19T03:20:23Z

Given the above - implementations can either (1) only support backends with support for fusing softmax (e.g. by bundling a backend, refusing to run on older backend, etc) or (2) implement support in user-land - I think we can close this out.

Do you agree @fdwr ?

@inexorabletash Yes, in any case, this is closeable because the backend can choose to execute it separately. I don't think it's good to croak on the model (your option 1), as backends should instead gracefully execute the convolution and the MLConv2dOptions::activation separately. @mingmingtasd: Concur?

huningxin · 2024-04-19T04:44:48Z

as backends should instead gracefully execute the convolution and the MLConv2dOptions::activation separately.

Should we add a note into the spec mentioning this fallback path?

wacky6 · 2024-04-22T14:27:47Z

Purely as a thought experiment:

Softmax requires summing over a dimension (out[i] = exp(input[i]) / sum(exp(input))). In this sense, softmax feels more like a normalization than an activation.

Other activations defined in the spec are element-wise (out[i] = fn(input[i])), so they can be trivially applied when filling the output buffer.

Just curious, how could a fuzed softmax improve performance (or reduce memory consumption)? How is a fused softmax activation different from syntax sugar of "Op + Softmax"?

Intuitively, the entire pre-softmax output (i.e. the intermediate result) needs to be computed before softmax can produce the result (because of the dependency on sum(input)).

So it seems the intermediate result is unavoidable (as opposed to being inlined for activations such as ReLU), negating the benefit of a fuzed activation.

fdwr · 2024-04-23T06:59:54Z

In this sense, softmax feels more like a normalization than an activation.

@wacky6: Softmax is also normalization. I tried stuffing ML ops into a few tidy categories here of my own arbitrary grouping, but then I realized they overlap (e.g. softmax in the table).

https://towardsdatascience.com/softmax-activation-function-how-it-actually-works-d292d335bd78 "Softmax Activation Function — How It Actually Works"
https://en.wikipedia.org/wiki/Softmax_function "The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes."

fdwr · 2024-05-02T04:30:33Z

Related: #658

how could a fuzed softmax improve performance (or reduce memory consumption)? How is a fused softmax activation different from syntax sugar of "Op + Softmax"?

@wacky6: I haven't seen the algorithm firsthand, but I recall there's a clever optimization where GPU/NPU drivers can factor out the reduction of softmax's denominator directly into the convolution pass, without needing a separate resummation.

Should we add a note into the spec mentioning this fallback path?

@huningxin: Alternately if we delete the 3 activation fields per issue 658 from @a-sully, then this issue becomes moot, as we'd instead add a note about backend fusions rather than a note about backend unfusions.

a-sully · 2024-05-02T15:24:24Z

@huningxin: Alternately if we delete the 3 activation fields per issue 658 from @a-sully, then this issue becomes moot, as we'd instead add a note about backend fusions rather than a note about backend unfusions.

Agreed, I propose we close this issue once #664 merges

huningxin · 2024-05-03T02:19:50Z

@huningxin: Alternately if we delete the 3 activation fields per issue 658 from @a-sully, then this issue becomes moot, as we'd instead add a note about backend fusions rather than a note about backend unfusions.

Agreed, I propose we close this issue once #664 merges

SGTM!

a-sully · 2024-05-06T02:55:47Z

#664 merged 🎉

fdwr · 2024-05-06T03:42:41Z

Closing as the activation field is deleted now, and the DML backend will make the decision to fuse or not based on the DML feature level (a tangentially related enabling CR: https://chromium-review.googlesource.com/c/chromium/src/+/5501114)

mingmingtasd changed the title ~~Should softmax be an fused activation ?~~ Should softmax be a fused activation ? Oct 19, 2023

inexorabletash mentioned this issue Feb 6, 2024

Process: Add documentation for labels, current and proposed #533

Merged

3 tasks

anssiko added the operator specific label Feb 7, 2024

fdwr closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should softmax be a fused activation ? #471

Should softmax be a fused activation ? #471

mingmingtasd commented Oct 19, 2023

fdwr commented Oct 19, 2023 •

edited

Loading

inexorabletash commented Apr 19, 2024

fdwr commented Apr 19, 2024

huningxin commented Apr 19, 2024

wacky6 commented Apr 22, 2024

fdwr commented Apr 23, 2024

fdwr commented May 2, 2024 •

edited

Loading

a-sully commented May 2, 2024

huningxin commented May 3, 2024

a-sully commented May 6, 2024

fdwr commented May 6, 2024 •

edited

Loading

Should softmax be a fused activation ? #471

Should softmax be a fused activation ? #471

Comments

mingmingtasd commented Oct 19, 2023

fdwr commented Oct 19, 2023 • edited Loading

inexorabletash commented Apr 19, 2024

fdwr commented Apr 19, 2024

huningxin commented Apr 19, 2024

wacky6 commented Apr 22, 2024

fdwr commented Apr 23, 2024

fdwr commented May 2, 2024 • edited Loading

a-sully commented May 2, 2024

huningxin commented May 3, 2024

a-sully commented May 6, 2024

fdwr commented May 6, 2024 • edited Loading

fdwr commented Oct 19, 2023 •

edited

Loading

fdwr commented May 2, 2024 •

edited

Loading

fdwr commented May 6, 2024 •

edited

Loading