[AArch64] Fold `and` and `cmp` into `tst` #102703

Kmeakin · 2024-08-09T23:20:39Z

InstCombine is able to fold (x & 0xFF) < C) into (x & -C) == 0

eg:

bool ult32_u32(u32 x) { return (x & 0xFF) < 32; }

produces

ult32_u32:
        tst     w0, #0xe0
        cset    w0, eq
        ret

But the same transform is not done in later stages, so if the and is introduced due to eg passing a u8 in a u32 register, the fold is not performed:

bool ult32_u8(u8 x) { return x < 32; }

ult32_u8:
        and     w8, w0, #0xff
        cmp     w8, #32
        cset    w0, lo
        ret

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-08-09T23:20:56Z

@llvm/issue-subscribers-backend-aarch64

Author: Karl Meakin (Kmeakin)

https://godbolt.org/z/Mh7q8TY99

InstCombine is able to fold (x & 0xFF) < C) into (x & -C) != 0

eg:

bool ult32_u32(u32 x) { return (x &amp; 0xFF) &lt; 32; }

produces

ult32_u32:
        tst     w0, #<!-- -->0xe0
        cset    w0, eq
        ret

But the same transform is not done in later stages, so if the and is introduced due to eg passing a u8 in a u32 register, the fold is not performed:

bool ult32_u8(u8 x) { return x &lt; 32; }

ult32_u8:
        and     w8, w0, #<!-- -->0xff
        cmp     w8, #<!-- -->32
        cset    w0, lo
        ret

pvimal816 · 2024-08-12T04:33:40Z

Hi @Kmeakin
Orthogonal to the root issue, shouldn't (x & -C) != 0 be (x & -C) == 0 instead?

Because, as per my understanding, tst a, b will set Z flag only if a&b is zero, and unless Z flag is set cset a, eq will not set a.

jf-botto · 2024-08-16T09:48:09Z

I'd like to work on this issue, if possible? Please assign me?

Kmeakin · 2024-09-20T17:13:48Z

Hi @Kmeakin Orthogonal to the root issue, shouldn't (x & -C) != 0 be (x & -C) == 0 instead?

Because, as per my understanding, tst a, b will set Z flag only if a&b is zero, and unless Z flag is set cset a, eq will not set a.

Yes, that was a typo, thanks

Fixes #102703. https://godbolt.org/z/nfj8xsb1Y The following pattern: ``` %2 = and i32 %0, 254 %3 = icmp eq i32 %2, 0 ``` is optimised by instcombine into: ```%3 = icmp ult i32 %0, 2``` However, post instcombine leads to worse aarch64 than the unoptimised version. Pre instcombine: ``` tst w0, #0xfe cset w0, eq ret ``` Post instcombine: ``` and w8, w0, #0xff cmp w8, #2 cset w0, lo ret ``` In the unoptimised version, SelectionDAG converts `SETCC (AND X 254) 0 EQ` into `CSEL 0 1 1 (ANDS X 254)`, which gets emitted as a `tst`. In the optimised version, SelectionDAG converts `SETCC (AND X 255) 2 ULT` into `CSEL 0 1 2 (SUBS (AND X 255) 2)`, which gets emitted as an `and`/`cmp`. This PR adds an optimisation to `AArch64ISelLowering`, converting `SETCC (AND X Y) Z ULT` into `SETCC (AND X (Y & ~(Z - 1))) 0 EQ` when `Z` is a power of two. This makes SelectionDAG/Codegen produce the same optimised code for both examples.

Fixes llvm#102703. https://godbolt.org/z/nfj8xsb1Y The following pattern: ``` %2 = and i32 %0, 254 %3 = icmp eq i32 %2, 0 ``` is optimised by instcombine into: ```%3 = icmp ult i32 %0, 2``` However, post instcombine leads to worse aarch64 than the unoptimised version. Pre instcombine: ``` tst w0, #0xfe cset w0, eq ret ``` Post instcombine: ``` and w8, w0, #0xff cmp w8, llvm#2 cset w0, lo ret ``` In the unoptimised version, SelectionDAG converts `SETCC (AND X 254) 0 EQ` into `CSEL 0 1 1 (ANDS X 254)`, which gets emitted as a `tst`. In the optimised version, SelectionDAG converts `SETCC (AND X 255) 2 ULT` into `CSEL 0 1 2 (SUBS (AND X 255) 2)`, which gets emitted as an `and`/`cmp`. This PR adds an optimisation to `AArch64ISelLowering`, converting `SETCC (AND X Y) Z ULT` into `SETCC (AND X (Y & ~(Z - 1))) 0 EQ` when `Z` is a power of two. This makes SelectionDAG/Codegen produce the same optimised code for both examples.

Kmeakin added backend:AArch64 missed-optimization labels Aug 9, 2024

davemgreen assigned jf-botto Aug 16, 2024

jf-botto mentioned this issue Sep 28, 2024

[AArch64] - Fold and and cmp into tst #110347

Merged

davemgreen closed this as completed in #110347 Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Fold `and` and `cmp` into `tst` #102703

[AArch64] Fold `and` and `cmp` into `tst` #102703

Kmeakin commented Aug 9, 2024 •

edited

Loading

llvmbot commented Aug 9, 2024

pvimal816 commented Aug 12, 2024 •

edited

Loading

jf-botto commented Aug 16, 2024

Kmeakin commented Sep 20, 2024

[AArch64] Fold and and cmp into tst #102703

[AArch64] Fold and and cmp into tst #102703

Comments

Kmeakin commented Aug 9, 2024 • edited Loading

llvmbot commented Aug 9, 2024

pvimal816 commented Aug 12, 2024 • edited Loading

jf-botto commented Aug 16, 2024

Kmeakin commented Sep 20, 2024

[AArch64] Fold `and` and `cmp` into `tst` #102703

[AArch64] Fold `and` and `cmp` into `tst` #102703

Kmeakin commented Aug 9, 2024 •

edited

Loading

pvimal816 commented Aug 12, 2024 •

edited

Loading