[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

abrown · 2021-03-09T23:24:29Z

The new Wasm SIMD instructions load[8|16|32|64]_lane and store[8|16|32|64]_lane were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:

load + insertlane becomes a single PINSR*
extractlane + store becomes a single PEXTR*

This change adds CLIF tests that should pass once the necessary pattern-matching issues are fixed.

The new Wasm SIMD instructions `load[8|16|32|64]_lane` and `store[8|16|32|64]_lane` were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions: - `load + insertlane` becomes a single `PINSR*` - `extractlane + store` becomes a single `PEXTR*`

jameysharp · 2022-09-01T01:11:15Z

Now that the x64 backend is migrated to ISLE, is it time to re-visit this optimization? cc: @elliottt

bjorn3 · 2022-09-01T07:31:16Z

This is only allowed for aligned pointers, right? Can you check the aligned memflag?

cfallin · 2023-02-09T00:28:34Z

@abrown are you interested in pursuing this further? (Going through old PRs and cleaning up...) I agree with @bjorn3 that the alignment issue is the critical question here, and so I suspect there won't be major opportunity coming from Wasm-SIMD (given that loads/stores only have alignment hints, not hard-enforced requirements), but we can still think about it further if there's some other aspect where it could help...

abrown · 2023-02-10T18:49:42Z

I had to refresh my mental cache for this issue quite a bit (it's been a while for this issue!). I don't know why I didn't originally respond, but as I dug into this, I didn't immediately find any requirement for these instructions to use aligned addresses. [searches more...] In fact, I do see the following in section 12.10.7 of the Intel manuals:

SSE4.1 adds 7 instructions (corresponding to 9 assembly instruction mnemonics) that simplify data insertion and extraction between general-purpose register (GPR) and XMM registers (EXTRACTPS, INSERTPS, PINSRB, PINSRD, PINSRQ, PEXTRB, PEXTRW, PEXTRD, and PEXTRQ). When accessing memory, no alignment is required for any of these instructions (unless alignment checking is enabled).

I think we could proceed with adding these tests?

Going through old PRs I stumbled across bytecodealliance#2716 which is quite old at this point. Upon adding the tests to `main` I see that most of it is actually implemented except for load-lane-from-memory where the lane size is 8 or 16 bits. That requires explicitly opting-in with `sinkable_load_exact` so this PR now subsumes the tests of bytecodealliance#2716 in addition to implementing this missing hole in lowerings. This refactoring shuffles around where definitions are located to more easily have access to `Value` to perform the relevant match. The generic `vec_insert_lane` helper is now gone as well in lieu of direct matches on `insertlane` lowerings. Closes bytecodealliance#2716

* x64: Refactor lowerings for `insertlane` Going through old PRs I stumbled across #2716 which is quite old at this point. Upon adding the tests to `main` I see that most of it is actually implemented except for load-lane-from-memory where the lane size is 8 or 16 bits. That requires explicitly opting-in with `sinkable_load_exact` so this PR now subsumes the tests of #2716 in addition to implementing this missing hole in lowerings. This refactoring shuffles around where definitions are located to more easily have access to `Value` to perform the relevant match. The generic `vec_insert_lane` helper is now gone as well in lieu of direct matches on `insertlane` lowerings. Closes #2716 * Remove a no-longer-needed helper function

abrown requested a review from cfallin March 9, 2021 23:25

github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 9, 2021

alexcrichton mentioned this pull request Mar 18, 2024

x64: Refactor lowerings for insertlane #8167

Merged

alexcrichton closed this in #8167 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

abrown commented Mar 9, 2021

jameysharp commented Sep 1, 2022

bjorn3 commented Sep 1, 2022

cfallin commented Feb 9, 2023

abrown commented Feb 10, 2023 •

edited

Loading

[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

Conversation

abrown commented Mar 9, 2021

jameysharp commented Sep 1, 2022

bjorn3 commented Sep 1, 2022

cfallin commented Feb 9, 2023

abrown commented Feb 10, 2023 • edited Loading

abrown commented Feb 10, 2023 •

edited

Loading