-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Implement SIMD swizzle
and shuffle
#6515
riscv64: Implement SIMD swizzle
and shuffle
#6515
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the following little nitpicks. Thanks!
@@ -1573,6 +1573,12 @@ | |||
|
|||
;; UImm5 Helpers | |||
|
|||
;; Extractor that matches a `Value` equivalent to a replicated UImm5 on all lanes. | |||
;; TODO: Try matching vconst here as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you either resolve this TODO in this PR or turn it into TODO(#1234)
with a reference to a follow up issue?
(mem_flags_trusted) | ||
(unmasked) | ||
ty)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: missing trailing newline
👋 Hey,
This PR implements the
swizzle
andshuffle
instructions in the RISC-V backend.swizzle
maps directly ontovrgather
with a SEW of 8, so that's a fairly simple implementation. Forshuffle
we have to do twovrgathers
one for values in the range of the first register and the second for values in range of the second register and merge them together.I double checked the
shuffle
implementation, and it seems to match what v8 does.vrgather
is a somewhat special instruction in that it forbids the destination register from being the same as any of the source registers (including the mask register). I've modeled this as anearly_def
, which seems to be correct based on what I've read from regalloc2 docs, but I'm not 100% sure.There are a few other instructions like this, but none that we have implemented yet.