-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support WASM Relaxed SIMD instructions #7312
Comments
I haven't looked at the spec for Relaxed SIMD closely enough to get a feel for whether it fits into the "fast-math" mold (ie, could we just assume it's OK to use these instructions when strict-float isn't in use)... I defer to your judgement here. That said, it makes me nervous / unhappy if they have added instructions that don't reasonably fit into any existing mold. As a short-term stopgap, adding explicit intrinsics for these seems like a pragmatic expedient, but in the long run I think we'd prefer to avoid doing that, as it seems like a road to writing Halide code that starts to diverge wildly (in the non-schedule section) over time. (FYI, I realize I've done most of the work on the wasm backend to date, but at present I don't know if my work priorities include making this happen in a timely manner; do you have any resources in mind for implementing this?) |
The proposal has 3 categories of instructions:
Categories 2 and 3 are pretty much fast-math-like. The first category allows swizzles and laneselects to have implementation-defined behavior if e.g. the swizzle indices are out-of-range; these seem reasonable to me as well. It's true that adding intrinsics has the danger of making the algorithm less agnostic to the backend-- this warrants a discussion perhaps in the dev meetings. In terms of implementation, I'd probably be the person doing the work for this with help from WASM SIMD standards folks. |
This may be reasonable for the floating point ops (I don't know about these, I can't really comment on them), but definitely not for the integer ones (e.g. Q-format multiplication and integer dot product). We could add an intrinsic for the Q-format multiplication, but I agree with Shoaib that the longer-term solution seems to be bounds inference to detect when the usage is legal. We cannot add an intrinsic for the integer dot product, as we don't support dimension-changing intrinsics. I am currently working on a PR that uses bounds inference for instruction selection on x86 and HVX. It's a bit far on my back-log, but I hope to have it done by ~end of March (paper deadlines might prevent me, we'll see). That machinery could be very useful for instruction selection on the relaxed integer instructions, and I'm happy to help with that once the first PR is done, time-permitting. I will probably have much more time to work on this after mid-April, I don't know if that's too far off to be useful. |
I think there was a race-condition, I didn't see Shoaib's response, sorry for repeating a bit! |
Halide should support the WASM Relaxed SIMD instructions. The current proposal is nearing final acceptance, and is implemented in the emscripten/LLVM toolchain, so should be testable. Speedups for certain use cases (neural network inference, for example) show speedups of 30-40% on desktop CPUs and 2-3x on ARM phones.
Design-wise, there's a couple options on how we should support these instructions. Given a target with explicit support for relaxed SIMD, we could:
(or both). Implementing automatic rewrites may be possible since we can, in many cases, infer ranges on values and thus prove that no non-determinism occurs. An initial prototype, however, should probably just add the intrinsics. AFAIK, LLVM currently also only generates these instructions from explicit instrinsics.
The text was updated successfully, but these errors were encountered: