Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x64: Optimize store-of-extract-lane-0 #5924

Merged

Conversation

alexcrichton
Copy link
Member

The movss and movsd instructions can be used to store the 0th lane of a t32x4 or a t64x2 vector into memory, enabling fusing a store and an extractlane instruction.

@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Mar 3, 2023
@alexcrichton alexcrichton requested a review from abrown March 8, 2023 20:49
Copy link
Contributor

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I think the discussion about alignment in #2716 may be relevant to this, if you're interested; essentially, what we're doing here for lane 0 can be done for all lanes with PINSR* and PEXTR* for unalignment memory on both the load and store side. I wonder if we should add the AVX versions of MOVSS and MOVSD here for completeness, though?

@alexcrichton
Copy link
Member Author

Agreed! I hope to get to that next with the special casing and that way extracing i32x4 and storing wouldn't use movss but would instead use pextrd.

For AVX support this PR sort of conflicts with #5931, so I'll update this once that goes in and the helper will automatically use vmovs{s,d} for AVX.

The `movss` and `movsd` instructions can be used to store the 0th lane
of a `t32x4` or a `t64x2` vector into memory, enabling fusing a `store`
and an `extractlane` instruction.
@alexcrichton alexcrichton force-pushed the optimize-store-extractlane branch from adcbc52 to 60ddf86 Compare March 10, 2023 00:42
@alexcrichton alexcrichton enabled auto-merge March 10, 2023 00:43
@alexcrichton alexcrichton added this pull request to the merge queue Mar 10, 2023
Merged via the queue into bytecodealliance:main with commit 0ec7b87 Mar 10, 2023
@alexcrichton alexcrichton deleted the optimize-store-extractlane branch March 10, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants