Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support WASM Relaxed SIMD instructions #7312

Open
shoaibkamil opened this issue Jan 31, 2023 · 4 comments
Open

Support WASM Relaxed SIMD instructions #7312

shoaibkamil opened this issue Jan 31, 2023 · 4 comments

Comments

@shoaibkamil
Copy link
Contributor

shoaibkamil commented Jan 31, 2023

Halide should support the WASM Relaxed SIMD instructions. The current proposal is nearing final acceptance, and is implemented in the emscripten/LLVM toolchain, so should be testable. Speedups for certain use cases (neural network inference, for example) show speedups of 30-40% on desktop CPUs and 2-3x on ARM phones.

Design-wise, there's a couple options on how we should support these instructions. Given a target with explicit support for relaxed SIMD, we could:

  1. add user-callable intrinsics for the relaxed instructions; or
  2. automatically rewrite sequences into relaxed instructions when appropriate

(or both). Implementing automatic rewrites may be possible since we can, in many cases, infer ranges on values and thus prove that no non-determinism occurs. An initial prototype, however, should probably just add the intrinsics. AFAIK, LLVM currently also only generates these instructions from explicit instrinsics.

@steven-johnson
Copy link
Contributor

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)... I defer to your judgement here. That said, it makes me nervous / unhappy if they have added instructions that don't reasonably fit into any existing mold.

As a short-term stopgap, adding explicit intrinsics for these seems like a pragmatic expedient, but in the long run I think we'd prefer to avoid doing that, as it seems like a road to writing Halide code that starts to diverge wildly (in the non-schedule section) over time.

(FYI, I realize I've done most of the work on the wasm backend to date, but at present I don't know if my work priorities include making this happen in a timely manner; do you have any resources in mind for implementing this?)

@shoaibkamil
Copy link
Contributor Author

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)... I defer to your judgement here. That said, it makes me nervous / unhappy if they have added instructions that don't reasonably fit into any existing mold.

The proposal has 3 categories of instructions:

  1. Integer instructions where the inputs are interpreted differently (e.g. swizzle, 4-D dot-product)
  2. Floating-point instructions whose behavior for out-of-range and NaNs differ (e.g. float-to-int conversions, float min/max)
  3. Floating-point instructions where the precision or order of operations differ (e.g. FMA, reciprocal instructions, sum reduction)

Categories 2 and 3 are pretty much fast-math-like. The first category allows swizzles and laneselects to have implementation-defined behavior if e.g. the swizzle indices are out-of-range; these seem reasonable to me as well.

It's true that adding intrinsics has the danger of making the algorithm less agnostic to the backend-- this warrants a discussion perhaps in the dev meetings. In terms of implementation, I'd probably be the person doing the work for this with help from WASM SIMD standards folks.

@rootjalex
Copy link
Member

I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)...

This may be reasonable for the floating point ops (I don't know about these, I can't really comment on them), but definitely not for the integer ones (e.g. Q-format multiplication and integer dot product).

We could add an intrinsic for the Q-format multiplication, but I agree with Shoaib that the longer-term solution seems to be bounds inference to detect when the usage is legal. We cannot add an intrinsic for the integer dot product, as we don't support dimension-changing intrinsics.

I am currently working on a PR that uses bounds inference for instruction selection on x86 and HVX. It's a bit far on my back-log, but I hope to have it done by ~end of March (paper deadlines might prevent me, we'll see). That machinery could be very useful for instruction selection on the relaxed integer instructions, and I'm happy to help with that once the first PR is done, time-permitting. I will probably have much more time to work on this after mid-April, I don't know if that's too far off to be useful.

@rootjalex
Copy link
Member

I think there was a race-condition, I didn't see Shoaib's response, sorry for repeating a bit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants