Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Open
calebzulawski opened this issue Jul 17, 2021 · 0 comments
Open

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

calebzulawski opened this issue Jul 17, 2021 · 0 comments
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla

Comments

@calebzulawski
Copy link
Contributor

calebzulawski commented Jul 17, 2021

Bugzilla Link 51122
Version 12.0
OS All
CC @Arnaud-de-Grandmaison-ARM,@DMG862,@RKSimon,@smithp35

Extended Description

The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal implementations over vectors of i1. This issue is similar to #38188 .

declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {
    %mask1 = trunc <16 x i8> %mask to <8 x i1>
    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
    ret i1 %reduced
}

produces

mask_reduce_or:                         // @mask_reduce_or
        umov    w14, v0.b[1]
        umov    w15, v0.b[0]
        umov    w13, v0.b[2]
        orr     w14, w15, w14
        umov    w12, v0.b[3]
        orr     w13, w14, w13
        umov    w11, v0.b[4]
        orr     w12, w13, w12
        umov    w10, v0.b[5]
        orr     w11, w12, w11
        umov    w9, v0.b[6]
        orr     w10, w11, w10
        umov    w8, v0.b[7]
        orr     w9, w10, w9
        orr     w8, w9, w8
        and     w0, w8, #0x1
        ret

when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
davemgreen pushed a commit that referenced this issue Mar 22, 2023
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.

This nearly resolves issues such as #50466, #40981, #38190.

Differential Revision: https://reviews.llvm.org/D146404
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

1 participant