[ARM] support new udot/sdot patterns #7800

rootjalex · 2023-08-23T22:01:58Z

udot and sdot can still be used even if we need to interleave the arguments (and is faster than a string of smull/saddws). This PR adds those patterns + tests (and a few fly-by FindIntrinsics fixes).

steven-johnson

LGTM pending green.

We should figure out how/where to document this; the work on apps/hannk showed that using the dotprod ops on arm could be a huge win, but they were tricky to generate (ie, the IR had to be just right) -- would be good to document what to do to get these.

Additionally: maybe we should add a dot_prod() intrinsic to IROperator.h, along with widening_mul and friends? Recognizing the patterns is highly desirable of course, but having something that means "always use the best ops specifically for this regardless of architecture" seems like it could be useful in pathological cases.

rootjalex · 2023-08-23T23:26:40Z

@steven-johnson I completely agree - unfortunately, it’s often quite difficult to express exactly what pattern will make it through the simplifier and trigger these patterns (that’s why I’ve included so many pattern variants). @abadams and I have discussed having a strong normalization pass to ease the job of the pattern matcher (and the programmer aiming for the dot product instruction). I think this is something we need to do (but I don’t currently have bandwidth for).

With regards to a new intrinsic - the main difficulty there is that we need a variadic dot product, and there’s not good consistency across backends (i.e. ARM has 4-way matching signed dot product, x86 has 2-way mixed-sign dot product, HVX has many). I worry that an intrinsic like that might be too hard to handle across backends.

steven-johnson · 2023-08-23T23:28:16Z

Yeah, I hear you, but telling people that you have to look at the generated assembly code to verify you got it right is an unreasonably onerous burden.

rootjalex · 2023-08-23T23:30:24Z

That’s completely fair! I think the best solution is a powerful normalizer, but could be convinced about the intrinsic.

abadams · 2023-08-23T23:33:30Z

Dot product instructions reduce a vector horizontally, and our front-end language doesn't have vectors, so there's no intrinsic we could add that would hit it guaranteed in the way you want. We do have vectors in the scheduling language, so the way to get a dot product instruction for sure is by using atomic().vectorize(some_rvar).

The kind of dot product AJ is targetting here is an opportunistic instruction selection trick where you save a few instructions by interleaving four different widening multiply adds and using udot instead of running them separately. Whether or not it's a win is very very architecture and type dependent.

rootjalex · 2023-08-24T19:49:50Z

Failures appear unrelated

[ARM] support new udot/sdot patterns

a490816

rootjalex requested review from steven-johnson and abadams August 23, 2023 22:01

steven-johnson added the release_notes For changes that may warrant a note in README for official releases. label Aug 23, 2023

steven-johnson approved these changes Aug 23, 2023

View reviewed changes

abadams approved these changes Aug 24, 2023

View reviewed changes

rootjalex merged commit 678ea32 into main Aug 24, 2023
3 checks passed

rootjalex deleted the rootjalex/arm-dot branch August 24, 2023 19:50

BrewTestBot mentioned this pull request Feb 2, 2024

halide 17.0.0 Homebrew/homebrew-core#161602

Closed

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

[ARM] support new udot/sdot patterns (halide#7800)

59f2272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM] support new udot/sdot patterns #7800

[ARM] support new udot/sdot patterns #7800

rootjalex commented Aug 23, 2023

steven-johnson left a comment

rootjalex commented Aug 23, 2023

steven-johnson commented Aug 23, 2023

rootjalex commented Aug 23, 2023

abadams commented Aug 23, 2023

rootjalex commented Aug 24, 2023

[ARM] support new udot/sdot patterns #7800

[ARM] support new udot/sdot patterns #7800

Conversation

rootjalex commented Aug 23, 2023

steven-johnson left a comment

Choose a reason for hiding this comment

rootjalex commented Aug 23, 2023

steven-johnson commented Aug 23, 2023

rootjalex commented Aug 23, 2023

abadams commented Aug 23, 2023

rootjalex commented Aug 24, 2023