Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arm][neon] llvm.experimental.reduce.{and, any} don't lower properly for boolean vectors #40981

Open
RKSimon opened this issue Apr 28, 2019 · 1 comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 28, 2019

Bugzilla Link 41636
Version trunk
OS Windows NT
CC @alexey-bataev,@gnzlbg,@smithp35

Extended Description

Split off from [Bug #​36702], armv7a generates poor code for boolean reduction from generic IR - either with the llvm.experimental.vector.reduce intrinsics (which expand to a shuffle reduction chain) or with bitcasts of the comparison result mask:

https://godbolt.org/z/U7C4n4

e.g.

ARMv7+NEON

LLVM6:

all_8x8:
vmov.i8 d0, #​0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #​4
vand d0, d0, d1
vext.8 d1, d0, d0, #​2
vand d0, d0, d1
vdup.8 d1, d0[1]
vand d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #​1
bx lr
any_8x8:
vmov.i8 d0, #​0x1
vldr d1, [r0]
vtst.8 d0, d1, d0
vext.8 d1, d0, d0, #​4
vorr d0, d0, d1
vext.8 d1, d0, d0, #​2
vorr d0, d0, d1
vdup.8 d1, d0[1]
vorr d0, d0, d1
vmov.u8 r0, d0[0]
and r0, r0, #​1
bx lr

Manually generated:

all_8x8:
vldr d0, [r0]
vpmin.u8 d16, d0, d16
vpmin.u8 d16, d16, d16
vpmin.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr

any_8x8:
vldr d0, [r0]
vpmax.u8 d16, d0, d16
vpmax.u8 d16, d16, d16
vpmax.u8 d0, d16, d16
vmov.u8 r0, d0[0]
bx lr

@alexey-bataev
Copy link
Member

Ping, there is https://reviews.llvm.org/D97961, which changes default cost for the logical reductions.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
davemgreen pushed a commit that referenced this issue Mar 22, 2023
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.

This nearly resolves issues such as #50466, #40981, #38190.

Differential Revision: https://reviews.llvm.org/D146404
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants