-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Fold and
and cmp
into tst
#102703
Comments
@llvm/issue-subscribers-backend-aarch64 Author: Karl Meakin (Kmeakin)
https://godbolt.org/z/Mh7q8TY99
InstCombine is able to fold eg: bool ult32_u32(u32 x) { return (x & 0xFF) < 32; } produces ult32_u32:
tst w0, #<!-- -->0xe0
cset w0, eq
ret But the same transform is not done in later stages, so if the and is introduced due to eg passing a u8 in a u32 register, the fold is not performed: bool ult32_u8(u8 x) { return x < 32; } ult32_u8:
and w8, w0, #<!-- -->0xff
cmp w8, #<!-- -->32
cset w0, lo
ret |
Hi @Kmeakin Because, as per my understanding, |
I'd like to work on this issue, if possible? Please assign me? |
Yes, that was a typo, thanks |
Fixes #102703. https://godbolt.org/z/nfj8xsb1Y The following pattern: ``` %2 = and i32 %0, 254 %3 = icmp eq i32 %2, 0 ``` is optimised by instcombine into: ```%3 = icmp ult i32 %0, 2``` However, post instcombine leads to worse aarch64 than the unoptimised version. Pre instcombine: ``` tst w0, #0xfe cset w0, eq ret ``` Post instcombine: ``` and w8, w0, #0xff cmp w8, #2 cset w0, lo ret ``` In the unoptimised version, SelectionDAG converts `SETCC (AND X 254) 0 EQ` into `CSEL 0 1 1 (ANDS X 254)`, which gets emitted as a `tst`. In the optimised version, SelectionDAG converts `SETCC (AND X 255) 2 ULT` into `CSEL 0 1 2 (SUBS (AND X 255) 2)`, which gets emitted as an `and`/`cmp`. This PR adds an optimisation to `AArch64ISelLowering`, converting `SETCC (AND X Y) Z ULT` into `SETCC (AND X (Y & ~(Z - 1))) 0 EQ` when `Z` is a power of two. This makes SelectionDAG/Codegen produce the same optimised code for both examples.
Fixes llvm#102703. https://godbolt.org/z/nfj8xsb1Y The following pattern: ``` %2 = and i32 %0, 254 %3 = icmp eq i32 %2, 0 ``` is optimised by instcombine into: ```%3 = icmp ult i32 %0, 2``` However, post instcombine leads to worse aarch64 than the unoptimised version. Pre instcombine: ``` tst w0, #0xfe cset w0, eq ret ``` Post instcombine: ``` and w8, w0, #0xff cmp w8, llvm#2 cset w0, lo ret ``` In the unoptimised version, SelectionDAG converts `SETCC (AND X 254) 0 EQ` into `CSEL 0 1 1 (ANDS X 254)`, which gets emitted as a `tst`. In the optimised version, SelectionDAG converts `SETCC (AND X 255) 2 ULT` into `CSEL 0 1 2 (SUBS (AND X 255) 2)`, which gets emitted as an `and`/`cmp`. This PR adds an optimisation to `AArch64ISelLowering`, converting `SETCC (AND X Y) Z ULT` into `SETCC (AND X (Y & ~(Z - 1))) 0 EQ` when `Z` is a power of two. This makes SelectionDAG/Codegen produce the same optimised code for both examples.
https://godbolt.org/z/Mh7q8TY99
InstCombine is able to fold
(x & 0xFF) < C)
into(x & -C) == 0
eg:
produces
But the same transform is not done in later stages, so if the and is introduced due to eg passing a u8 in a u32 register, the fold is not performed:
The text was updated successfully, but these errors were encountered: