Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x64] Coalesce loads/stores when paired with an insert/extract lane #2716

Closed
wants to merge 1 commit into from

Conversation

abrown
Copy link
Contributor

@abrown abrown commented Mar 9, 2021

The new Wasm SIMD instructions load[8|16|32|64]_lane and store[8|16|32|64]_lane were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:

  • load + insertlane becomes a single PINSR*
  • extractlane + store becomes a single PEXTR*

This change adds CLIF tests that should pass once the necessary pattern-matching issues are fixed.

The new Wasm SIMD instructions `load[8|16|32|64]_lane` and `store[8|16|32|64]_lane` were designed specifically for lowering to a single instruction in the Wasm runtimes. In the Cranelift backend, we pattern match to perform the following conversions:
 - `load + insertlane` becomes a single `PINSR*`
 - `extractlane + store` becomes a single `PEXTR*`
@abrown abrown requested a review from cfallin March 9, 2021 23:25
@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Mar 9, 2021
@jameysharp
Copy link
Contributor

Now that the x64 backend is migrated to ISLE, is it time to re-visit this optimization? cc: @elliottt

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 1, 2022

This is only allowed for aligned pointers, right? Can you check the aligned memflag?

@cfallin
Copy link
Member

cfallin commented Feb 9, 2023

@abrown are you interested in pursuing this further? (Going through old PRs and cleaning up...) I agree with @bjorn3 that the alignment issue is the critical question here, and so I suspect there won't be major opportunity coming from Wasm-SIMD (given that loads/stores only have alignment hints, not hard-enforced requirements), but we can still think about it further if there's some other aspect where it could help...

@abrown
Copy link
Contributor Author

abrown commented Feb 10, 2023

I had to refresh my mental cache for this issue quite a bit (it's been a while for this issue!). I don't know why I didn't originally respond, but as I dug into this, I didn't immediately find any requirement for these instructions to use aligned addresses. [searches more...] In fact, I do see the following in section 12.10.7 of the Intel manuals:

SSE4.1 adds 7 instructions (corresponding to 9 assembly instruction mnemonics) that simplify data insertion and extraction between general-purpose register (GPR) and XMM registers (EXTRACTPS, INSERTPS, PINSRB, PINSRD, PINSRQ, PEXTRB, PEXTRW, PEXTRD, and PEXTRQ). When accessing memory, no alignment is required for any of these instructions (unless alignment checking is enabled).

I think we could proceed with adding these tests?

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Mar 18, 2024
Going through old PRs I stumbled across bytecodealliance#2716 which is quite old at this
point. Upon adding the tests to `main` I see that most of it is actually
implemented except for load-lane-from-memory where the lane size is 8 or
16 bits. That requires explicitly opting-in with `sinkable_load_exact`
so this PR now subsumes the tests of bytecodealliance#2716 in addition to implementing
this missing hole in lowerings.

This refactoring shuffles around where definitions are located to more
easily have access to `Value` to perform the relevant match. The generic
`vec_insert_lane` helper is now gone as well in lieu of direct matches
on `insertlane` lowerings.

Closes bytecodealliance#2716
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2024
* x64: Refactor lowerings for `insertlane`

Going through old PRs I stumbled across #2716 which is quite old at this
point. Upon adding the tests to `main` I see that most of it is actually
implemented except for load-lane-from-memory where the lane size is 8 or
16 bits. That requires explicitly opting-in with `sinkable_load_exact`
so this PR now subsumes the tests of #2716 in addition to implementing
this missing hole in lowerings.

This refactoring shuffles around where definitions are located to more
easily have access to `Value` to perform the relevant match. The generic
`vec_insert_lane` helper is now gone as well in lieu of direct matches
on `insertlane` lowerings.

Closes #2716

* Remove a no-longer-needed helper function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants