You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Split off from [Bug #36702], armv7a generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.
This nearly resolves issues such as #50466, #40981, #38190.
Differential Revision: https://reviews.llvm.org/D146404
Extended Description
Split off from [Bug #36702], armv7a generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:
https://godbolt.org/z/U7C4n4
e.g.
ARMv7+NEON
LLVM6:
all_8x8:
vmov.i8 d0, #0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #4
vand d0, d0, d1
vext.8 d1, d0, d0, #2
vand d0, d0, d1
vdup.8 d1, d0[1]
vand d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #1
bx lr
any_8x8:
vmov.i8 d0, #0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #4
vorr d0, d0, d1
vext.8 d1, d0, d0, #2
vorr d0, d0, d1
vdup.8 d1, d0[1]
vorr d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #1
bx lr
Manually generated:
all_8x8:
vldr d0, [r0]
vpmin.u8 d16, d0, d16
vpmin.u8 d16, d16, d16
vpmin.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr
any_8x8:
vldr d0, [r0]
vpmax.u8 d16, d0, d16
vpmax.u8 d16, d16, d16
vpmax.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr
The text was updated successfully, but these errors were encountered: