-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64] does not use rev32/rev64 instructions, resulting in redundant shift operations #130469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: None (k-arrows)
Here is the code from gcc testsuite.
https://godbolt.org/z/jzdcsfxx4
```c
typedef char __attribute__ ((vector_size (16))) v16qi;
typedef unsigned short __attribute__ ((vector_size (16))) v8hi;
typedef unsigned int __attribute__ ((vector_size (16))) v4si;
typedef unsigned long long __attribute__ ((vector_size (16))) v2di;
typedef unsigned short __attribute__ ((vector_size (8))) v4hi;
typedef unsigned int __attribute__ ((vector_size (8))) v2si;
v2di v4si v8hi v2si v4hi
|
Hi, I'm looking into this right now, could I please be assigned to the issue? |
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Fixes llvm#130469 Now uses REV32/REV64 instructions to complete operation. New Output: ``` G1: rev64 v0.4s, v0.4s ret G2: rev32 v0.8h, v0.8h ret G3: rev16 v0.16b, v0.16b ret G4: rev32 v0.4h, v0.4h ret G5: rev16 v0.8b, v0.8b ret ``` Old Output: ``` G1: shl v1.2d, v0.2d, llvm#32 usra v1.2d, v0.2d, llvm#32 mov v0.16b, v1.16b ret G2: shl v1.4s, v0.4s, llvm#16 usra v1.4s, v0.4s, llvm#16 mov v0.16b, v1.16b ret G3: rev16 v0.16b, v0.16b ret G4: shl v1.2s, v0.2s, llvm#16 usra v1.2s, v0.2s, llvm#16 fmov d0, d1 ret G5: rev16 v0.8b, v0.8b ret ```
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Fixes llvm#130469 Now uses REV32/REV64 instructions to complete operation. New Output: ``` G1: rev64 v0.4s, v0.4s ret G2: rev32 v0.8h, v0.8h ret G3: rev16 v0.16b, v0.16b ret G4: rev32 v0.4h, v0.4h ret G5: rev16 v0.8b, v0.8b ret ``` Old Output: ``` G1: shl v1.2d, v0.2d, llvm#32 usra v1.2d, v0.2d, llvm#32 mov v0.16b, v1.16b ret G2: shl v1.4s, v0.4s, llvm#16 usra v1.4s, v0.4s, llvm#16 mov v0.16b, v1.16b ret G3: rev16 v0.16b, v0.16b ret G4: shl v1.2s, v0.2s, llvm#16 usra v1.2s, v0.2s, llvm#16 fmov d0, d1 ret G5: rev16 v0.8b, v0.8b ret ```
IanWood1
pushed a commit
to IanWood1/llvm-project
that referenced
this issue
May 6, 2025
Fixes llvm#130469 Now uses REV32/REV64 instructions to complete operation. New Output: ``` G1: rev64 v0.4s, v0.4s ret G2: rev32 v0.8h, v0.8h ret G3: rev16 v0.16b, v0.16b ret G4: rev32 v0.4h, v0.4h ret G5: rev16 v0.8b, v0.8b ret ``` Old Output: ``` G1: shl v1.2d, v0.2d, llvm#32 usra v1.2d, v0.2d, llvm#32 mov v0.16b, v1.16b ret G2: shl v1.4s, v0.4s, llvm#16 usra v1.4s, v0.4s, llvm#16 mov v0.16b, v1.16b ret G3: rev16 v0.16b, v0.16b ret G4: shl v1.2s, v0.2s, llvm#16 usra v1.2s, v0.2s, llvm#16 fmov d0, d1 ret G5: rev16 v0.8b, v0.8b ret ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here is the code from gcc testsuite.
https://godbolt.org/z/jzdcsfxx4
GCC efficiently uses rev32 or rev64 to complete the operation in a single instruction.
The text was updated successfully, but these errors were encountered: