Skip to content

[arm][neon] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40981

Open
@RKSimon

Description

@RKSimon
Bugzilla Link 41636
Version trunk
OS Windows NT
CC @alexey-bataev,@gnzlbg,@smithp35

Extended Description

Split off from [Bug #​36702], armv7a generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:

https://godbolt.org/z/U7C4n4

e.g.

ARMv7+NEON

LLVM6:

all_8x8:
vmov.i8 d0, #​0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #​4
vand d0, d0, d1
vext.8 d1, d0, d0, #​2
vand d0, d0, d1
vdup.8 d1, d0[1]
vand d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #​1
bx lr
any_8x8:
vmov.i8 d0, #​0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #​4
vorr d0, d0, d1
vext.8 d1, d0, d0, #​2
vorr d0, d0, d1
vdup.8 d1, d0[1]
vorr d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #​1
bx lr

Manually generated:

all_8x8:
vldr d0, [r0]
vpmin.u8 d16, d0, d16
vpmin.u8 d16, d16, d16
vpmin.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr

any_8x8:
vldr d0, [r0]
vpmax.u8 d16, d0, d16
vpmax.u8 d16, d16, d16
vpmax.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions