Support WASM Relaxed SIMD instructions #7312

shoaibkamil · 2023-01-31T18:14:27Z

Halide should support the WASM Relaxed SIMD instructions. The current proposal is nearing final acceptance, and is implemented in the emscripten/LLVM toolchain, so should be testable. Speedups for certain use cases (neural network inference, for example) show speedups of 30-40% on desktop CPUs and 2-3x on ARM phones.

Design-wise, there's a couple options on how we should support these instructions. Given a target with explicit support for relaxed SIMD, we could:

add user-callable intrinsics for the relaxed instructions; or
automatically rewrite sequences into relaxed instructions when appropriate

(or both). Implementing automatic rewrites may be possible since we can, in many cases, infer ranges on values and thus prove that no non-determinism occurs. An initial prototype, however, should probably just add the intrinsics. AFAIK, LLVM currently also only generates these instructions from explicit instrinsics.

steven-johnson · 2023-01-31T18:34:30Z

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)... I defer to your judgement here. That said, it makes me nervous / unhappy if they have added instructions that don't reasonably fit into any existing mold.

As a short-term stopgap, adding explicit intrinsics for these seems like a pragmatic expedient, but in the long run I think we'd prefer to avoid doing that, as it seems like a road to writing Halide code that starts to diverge wildly (in the non-schedule section) over time.

(FYI, I realize I've done most of the work on the wasm backend to date, but at present I don't know if my work priorities include making this happen in a timely manner; do you have any resources in mind for implementing this?)

shoaibkamil · 2023-01-31T20:44:32Z

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)... I defer to your judgement here. That said, it makes me nervous / unhappy if they have added instructions that don't reasonably fit into any existing mold.

The proposal has 3 categories of instructions:

Integer instructions where the inputs are interpreted differently (e.g. swizzle, 4-D dot-product)

Floating-point instructions whose behavior for out-of-range and NaNs differ (e.g. float-to-int conversions, float min/max)

Floating-point instructions where the precision or order of operations differ (e.g. FMA, reciprocal instructions, sum reduction)

Categories 2 and 3 are pretty much fast-math-like. The first category allows swizzles and laneselects to have implementation-defined behavior if e.g. the swizzle indices are out-of-range; these seem reasonable to me as well.

It's true that adding intrinsics has the danger of making the algorithm less agnostic to the backend-- this warrants a discussion perhaps in the dev meetings. In terms of implementation, I'd probably be the person doing the work for this with help from WASM SIMD standards folks.

rootjalex · 2023-01-31T20:47:55Z

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)...

This may be reasonable for the floating point ops (I don't know about these, I can't really comment on them), but definitely not for the integer ones (e.g. Q-format multiplication and integer dot product).

We could add an intrinsic for the Q-format multiplication, but I agree with Shoaib that the longer-term solution seems to be bounds inference to detect when the usage is legal. We cannot add an intrinsic for the integer dot product, as we don't support dimension-changing intrinsics.

I am currently working on a PR that uses bounds inference for instruction selection on x86 and HVX. It's a bit far on my back-log, but I hope to have it done by ~end of March (paper deadlines might prevent me, we'll see). That machinery could be very useful for instruction selection on the relaxed integer instructions, and I'm happy to help with that once the first PR is done, time-permitting. I will probably have much more time to work on this after mid-April, I don't know if that's too far off to be useful.

rootjalex · 2023-01-31T20:49:31Z

I think there was a race-condition, I didn't see Shoaib's response, sorry for repeating a bit!

rootjalex mentioned this issue Apr 29, 2024

[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection #7805

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support WASM Relaxed SIMD instructions #7312

Support WASM Relaxed SIMD instructions #7312

shoaibkamil commented Jan 31, 2023 •

edited

Loading

steven-johnson commented Jan 31, 2023

shoaibkamil commented Jan 31, 2023

rootjalex commented Jan 31, 2023

rootjalex commented Jan 31, 2023

Support WASM Relaxed SIMD instructions #7312

Support WASM Relaxed SIMD instructions #7312

Comments

shoaibkamil commented Jan 31, 2023 • edited Loading

steven-johnson commented Jan 31, 2023

shoaibkamil commented Jan 31, 2023

rootjalex commented Jan 31, 2023

rootjalex commented Jan 31, 2023

shoaibkamil commented Jan 31, 2023 •

edited

Loading