Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

calebzulawski · 2021-07-17T04:29:52Z


Bugzilla Link	51122
Version	12.0
OS	All
CC	@Arnaud-de-Grandmaison-ARM,@DMG862,@RKSimon,@smithp35

Extended Description

The binary reduction intrinsics on Aarch64 (and ARM) produce suboptimal implementations over vectors of i1. This issue is similar to #38188 .

declare i1 @llvm.vector.reduce.or.v16i1(<16 x i1> %a);

define i1 @mask_reduce_or(<16 x i8> %mask) {
    %mask1 = trunc <16 x i8> %mask to <8 x i1>
    %reduced = call i1 @llvm.vector.reduce.or.v16i1(<8 x i1> %mask1)
    ret i1 %reduced
}

produces

mask_reduce_or:                         // @mask_reduce_or
        umov    w14, v0.b[1]
        umov    w15, v0.b[0]
        umov    w13, v0.b[2]
        orr     w14, w15, w14
        umov    w12, v0.b[3]
        orr     w13, w14, w13
        umov    w11, v0.b[4]
        orr     w12, w13, w12
        umov    w10, v0.b[5]
        orr     w11, w12, w11
        umov    w9, v0.b[6]
        orr     w10, w11, w10
        umov    w8, v0.b[7]
        orr     w9, w10, w9
        orr     w8, w9, w8
        and     w0, w8, #0x1
        ret

when it could instead use vmaxvq (or vpmax on ARM).

The same goes for vector.reduce.and with vminvq (or vpmin on ARM).

The text was updated successfully, but these errors were encountered:

This patch adds some more efficient lowering for vecreduce.min/max under NEON, using sequences of pairwise vpmin/vpmax to reduce to a single value. This nearly resolves issues such as #50466, #40981, #38190. Differential Revision: https://reviews.llvm.org/D146404

llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021

workingjubilee mentioned this issue Dec 24, 2021

Bad codegen for boolean reductions on thumbv7neon rust-lang/portable-simd#146

Open

calebzulawski mentioned this issue Sep 22, 2022

Bad codegen for bitwise OR/AND masks rust-lang/portable-simd#303

Open

thomcc mentioned this issue Oct 31, 2022

x86_64 SSE2 fast-path for str.contains(&str) and short needles rust-lang/rust#103779

Merged

calebzulawski mentioned this issue Feb 11, 2023

Simd-using functions sometimes scalarize after inlining, even if they use vector ops on their own rust-lang/portable-simd#321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

calebzulawski commented Jul 17, 2021 •

edited by RKSimon

Loading

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Suboptimal codegen for llvm.vector.reduce of <N x i1> #50466

Comments

calebzulawski commented Jul 17, 2021 • edited by RKSimon Loading

Extended Description

calebzulawski commented Jul 17, 2021 •

edited by RKSimon

Loading