-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64: Specialize constant vector shifts #5976
aarch64: Specialize constant vector shifts #5976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thanks, this codegen is much improved!
; ret | ||
; | ||
; Disassembled: | ||
; block0: ; offset 0x0 | ||
; ldr q5, #8 | ||
; ldr q1, #8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to your changes here but I'm surprised we haven't moved from inline constants we branch around to the VCodeConstant
infrastructure; I guess I am confusing this with the x64 backend, where we do have that. Using the end-of-func constant pool here as well might be a nice improvement for a followup PR, especially if the data lands in the middle of an inner loop and pushes the body into another icache line or fetch cycle...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! The fallback for shuffle
does an inline load like this as well, and I'm curious enough to hopefully try to give this a stab in the near future
Mind re-reviewing the latest commit when you get a chance @cfallin? Turns out I was only running Cranelift's test suite and the wasm test suite exercised a case where a right-shift happened at the full bit-width of the type triggering assertions in the backend for emission of these instructions with a 0-shift. I've special-cased the ISLE lowerings to not emit an instruction with 0-shifts but I'm not sure if the ISLE is correct since it relies on rule priorities to be correct, which I don't think is how we'd like to rely on ISLE, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with or without below addition. Thanks for fixing; I hadn't realized that about aarch64 vector shifts!
|
||
; VCode: | ||
; block0: | ||
; shl v0.16b, v0.16b, #0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something obvious but is there any reason we can't match this left-shift-by-0 (when masked) in the same way and pass the value straight through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me like something ideally left to an egraph-style optimization which removes the shift before codegen, and I ideally wanted to do the same thing for the other lowerings but it was just the restrictions on arm64 instruction incoded which prevented ignoring the shift-right-by-zero case. That being said I may as well add it to be consistent with the other rules!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's true -- it's a reasonable algebraic simplification that we should include in the mid-end as well (might increase reach of other opts etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. We do have egraph rules for removing shifts and rotates by 0. Are they not firing here? Are these filetests running with optimization disabled, maybe?
If you need to match on shifts by zero to avoid generating illegal instructions in a particular backend, that's important since we don't necessarily run the egraph pass. But I would be inclined not to bother special-casing anything in the backend that isn't necessary for correctness if it can be done in the egraph pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shifts here are by the full bit width, e.g. a shift by 8 for i8x16
, which I don't think is accounted for in the egraph rules (but I just checked to see the checks for 0-shifts, so this would "just" be an extension of that rule). I don't know whether filetests run with or without optimizations, but I would guess without.
Avoiding the 0-case is required for the shift-right cases for correctness since it looks like right-shifts with zero-width can't be encoded as such an instruction (I think this has to do with how right-shift is a negative left-shift on AArch64 or something like that, so maybe this is a too-strict assertion somewhere, not sure, I didn't dig deeply enough to go that far).
I went ahead and added the case here as requested for shift-lefts even though it isn't strictly necessary as required by the backend. In some sense though it does sort of feel like if someone reads that rule it depends on the time of day as to whether they would judge whether or not the rule should be included, so in that sense I don't know whether to leave it in or not as it's not obviously correct nor obviously incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jameysharp in general that we shouldn't pull too many opts into lowering, because of cost in non-optimizing mode and because they can't interact with the main rewriting fixpoint when we're already here. I don't feel too strongly about this case; I guess I would note that it's a nice symmetry with the right-shift case here (which is why it popped out at me originally). On the other hand, we don't special-case x + 0
in the backends, nor should we. So... I'll happily hand the baton to @jameysharp for final judgment here and only request that we extend the mid-end rule to understand masking and cover this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if I have the baton…
- I think our general principle should be that algebraic simplifications should not be in backend rules. We should save our backend complexity budget for special cases where we can select a better instruction sequence due to target-specific features.
- I don't feel strongly enough about this principle to block merging this PR. I'd lean toward removing the special case for left-shifts but you're welcome to merge as-is if you want to on the basis of @cfallin's r+.
We'll have to figure out how best to handle the wrapping semantics of shifts in the mid-end rules. We should review any existing mid-end rules that match on constant shift amounts to make sure they're handling this correctly, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me! I've removed the left-shift case and rebased as well 👍
828c2ec
to
ea8077f
Compare
This commit adds special lowering rules for vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.
ea8077f
to
803e188
Compare
This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.