-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x64] Coalesce loads/stores when paired with an insert/extract lane #2716
Conversation
The new Wasm SIMD instructions `load[8|16|32|64]_lane` and `store[8|16|32|64]_lane` were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions: - `load + insertlane` becomes a single `PINSR*` - `extractlane + store` becomes a single `PEXTR*`
Now that the x64 backend is migrated to ISLE, is it time to re-visit this optimization? cc: @elliottt |
This is only allowed for aligned pointers, right? Can you check the aligned memflag? |
@abrown are you interested in pursuing this further? (Going through old PRs and cleaning up...) I agree with @bjorn3 that the alignment issue is the critical question here, and so I suspect there won't be major opportunity coming from Wasm-SIMD (given that loads/stores only have alignment hints, not hard-enforced requirements), but we can still think about it further if there's some other aspect where it could help... |
I had to refresh my mental cache for this issue quite a bit (it's been a while for this issue!). I don't know why I didn't originally respond, but as I dug into this, I didn't immediately find any requirement for these instructions to use aligned addresses. [searches more...] In fact, I do see the following in section 12.10.7 of the Intel manuals:
I think we could proceed with adding these tests? |
Going through old PRs I stumbled across bytecodealliance#2716 which is quite old at this point. Upon adding the tests to `main` I see that most of it is actually implemented except for load-lane-from-memory where the lane size is 8 or 16 bits. That requires explicitly opting-in with `sinkable_load_exact` so this PR now subsumes the tests of bytecodealliance#2716 in addition to implementing this missing hole in lowerings. This refactoring shuffles around where definitions are located to more easily have access to `Value` to perform the relevant match. The generic `vec_insert_lane` helper is now gone as well in lieu of direct matches on `insertlane` lowerings. Closes bytecodealliance#2716
* x64: Refactor lowerings for `insertlane` Going through old PRs I stumbled across #2716 which is quite old at this point. Upon adding the tests to `main` I see that most of it is actually implemented except for load-lane-from-memory where the lane size is 8 or 16 bits. That requires explicitly opting-in with `sinkable_load_exact` so this PR now subsumes the tests of #2716 in addition to implementing this missing hole in lowerings. This refactoring shuffles around where definitions are located to more easily have access to `Value` to perform the relevant match. The generic `vec_insert_lane` helper is now gone as well in lieu of direct matches on `insertlane` lowerings. Closes #2716 * Remove a no-longer-needed helper function
The new Wasm SIMD instructions
load[8|16|32|64]_lane
andstore[8|16|32|64]_lane
were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:load + insertlane
becomes a singlePINSR*
extractlane + store
becomes a singlePEXTR*
This change adds CLIF tests that should pass once the necessary pattern-matching issues are fixed.