[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520 #132246

nasherm · 2025-03-20T16:01:04Z

Inefficient SVE codegen occurs on at least two in-order cores,
those being Cortex-A510 and Cortex-A520. For example a simple vector
add

void foo(float a, float b, float dst, unsigned n) {
    for (unsigned i = 0; i < n; ++i)
        dst[i] = a[i] + b[i];
}

Vectorizes the inner loop into the following interleaved sequence
of instructions.

        add     x12, x1, x10
        ld1b    { z0.b }, p0/z, [x1, x10]
        add     x13, x2, x10
        ld1b    { z1.b }, p0/z, [x2, x10]
        ldr     z2, [x12, #1, mul vl]
        ldr     z3, [x13, #1, mul vl]
        dech    x11
        add     x12, x0, x10
        fadd    z0.s, z1.s, z0.s
        fadd    z1.s, z3.s, z2.s
        st1b    { z0.b }, p0, [x0, x10]
        addvl   x10, x10, #2
        str     z1, [x12, #1, mul vl]

By adjusting the target features to prefer fixed over scalable if the
cost is equal we get the following vectorized loop.

         ldp q0, q3, [x11, #-16]
         subs    x13, x13, #8
         ldp q1, q2, [x10, #-16]
         add x10, x10, #32
         add x11, x11, #32
         fadd    v0.4s, v1.4s, v0.4s
         fadd    v1.4s, v2.4s, v3.4s
         stp q0, q1, [x12, #-16]
         add x12, x12, #32

Which is more efficient.

llvmbot · 2025-03-20T16:01:39Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Nashe Mncube (nasherm)

Changes

The default MaxInterleaveFactor for AArch64 targets is 2. This produces inefficient codegen on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add

void foo(float a, float b, float dst, unsigned n) {
    for (unsigned i = 0; i &lt; n; ++i)
        dst[i] = a[i] + b[i];
}

Vectorizes the inner loop into the following interleaved sequence of instructions

        add     x12, x1, x10
        ld1b    { z0.b }, p0/z, [x1, x10]
        add     x13, x2, x10
        ld1b    { z1.b }, p0/z, [x2, x10]
        ldr     z2, [x12, #<!-- -->1, mul vl]
        ldr     z3, [x13, #<!-- -->1, mul vl]
        dech    x11
        add     x12, x0, x10
        fadd    z0.s, z1.s, z0.s
        fadd    z1.s, z3.s, z2.s
        st1b    { z0.b }, p0, [x0, x10]
        addvl   x10, x10, #<!-- -->2
        str     z1, [x12, #<!-- -->1, mul vl]

while when we reduce MaxInterleaveFactor to 1 we get the following

.LBB0_13:                               // %vector.body
                                        // =&gt;This Inner Loop Header: Depth=1
    ld1w    { z0.s }, p0/z, [x1, x10, lsl #<!-- -->2]
    ld1w    { z1.s }, p0/z, [x2, x10, lsl #<!-- -->2]
    fadd    z0.s, z1.s, z0.s
    st1w    { z0.s }, p0, [x0, x10, lsl #<!-- -->2]
    incw    x10

This patch also introduces a test

Patch is 30.69 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/132246.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+1)
(added) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleave-inorder-core.ll (+360)

diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
index bb36af8fce5cc..57ae4dfb71c36 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.cpp
@@ -181,6 +181,7 @@ void AArch64Subtarget::initializeProperties(bool HasMinSize) {
     VScaleForTuning = 1;
     PrefLoopAlignment = Align(16);
     MaxBytesForLoopAlignment = 8;
+    MaxInterleaveFactor = 1;
     break;
   case CortexA710:
   case CortexA715:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleave-inorder-core.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleave-inorder-core.ll
new file mode 100644
index 0000000000000..a3bf37726943f
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleave-inorder-core.ll
@@ -0,0 +1,360 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | FileCheck %s --check-prefix=CHECK-CA510-NOINTERLEAVE
+; RUN: opt < %s -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -force-target-max-vector-interleave=2 -S | FileCheck %s --check-prefix=CHECK-CA510-INTERLEAVE
+; RUN: opt < %s -mtriple=aarch64-none-elf -mcpu=cortex-a520 -mattr=+sve -passes=loop-vectorize -S | FileCheck %s --check-prefix=CHECK-CA520-NOINTERLEAVE
+; RUN: opt < %s -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -force-target-max-vector-interleave=2 -S | FileCheck %s --check-prefix=CHECK-CA520-INTERLEAVE
+
+define void @sve_add(ptr  %dst, ptr  %a, ptr  %b, i64 %n) {
+; CHECK-CA510-NOINTERLEAVE-LABEL: define void @sve_add(
+; CHECK-CA510-NOINTERLEAVE-SAME: ptr [[DST:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-CA510-NOINTERLEAVE-NEXT:  [[ENTRY:.*:]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[B3:%.*]] = ptrtoint ptr [[B]] to i64
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[A2:%.*]] = ptrtoint ptr [[A]] to i64
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[CMP9_NOT:%.*]] = icmp eq i64 [[N]], 0
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[CMP9_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY_PREHEADER:.*]]
+; CHECK-CA510-NOINTERLEAVE:       [[FOR_BODY_PREHEADER]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 12, i64 [[TMP1]])
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], [[TMP2]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-CA510-NOINTERLEAVE:       [[VECTOR_MEMCHECK]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP6:%.*]] = sub i64 [[DST1]], [[A2]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP6]], [[TMP5]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP4]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP8:%.*]] = sub i64 [[DST1]], [[B3]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP8]], [[TMP7]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[CONFLICT_RDX]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; CHECK-CA510-NOINTERLEAVE:       [[VECTOR_PH]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP10]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK-CA510-NOINTERLEAVE:       [[VECTOR_BODY]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP13:%.*]] = add i64 [[INDEX]], 0
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP14:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[TMP13]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP15:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i32 0
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x float>, ptr [[TMP15]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP16:%.*]] = getelementptr inbounds nuw float, ptr [[B]], i64 [[TMP13]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP17:%.*]] = getelementptr inbounds nuw float, ptr [[TMP16]], i32 0
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[WIDE_LOAD5:%.*]] = load <vscale x 4 x float>, ptr [[TMP17]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP18:%.*]] = fadd fast <vscale x 4 x float> [[WIDE_LOAD5]], [[WIDE_LOAD]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP19:%.*]] = getelementptr inbounds nuw float, ptr [[DST]], i64 [[TMP13]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP20:%.*]] = getelementptr inbounds nuw float, ptr [[TMP19]], i32 0
+; CHECK-CA510-NOINTERLEAVE-NEXT:    store <vscale x 4 x float> [[TMP18]], ptr [[TMP20]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[TMP21]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-CA510-NOINTERLEAVE:       [[MIDDLE_BLOCK]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-CA510-NOINTERLEAVE:       [[SCALAR_PH]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK-CA510-NOINTERLEAVE:       [[FOR_BODY]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDVARS_IV]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP22:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw float, ptr [[B]], i64 [[INDVARS_IV]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[TMP23:%.*]] = load float, ptr [[ARRAYIDX2]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[ADD:%.*]] = fadd fast float [[TMP23]], [[TMP22]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds nuw float, ptr [[DST]], i64 [[INDVARS_IV]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    store float [[ADD]], ptr [[ARRAYIDX4]], align 4
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-CA510-NOINTERLEAVE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-CA510-NOINTERLEAVE:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; CHECK-CA510-NOINTERLEAVE:       [[FOR_COND_CLEANUP]]:
+; CHECK-CA510-NOINTERLEAVE-NEXT:    ret void
+;
+; CHECK-CA510-INTERLEAVE-LABEL: define void @sve_add(
+; CHECK-CA510-INTERLEAVE-SAME: ptr [[DST:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-CA510-INTERLEAVE-NEXT:  [[ENTRY:.*:]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[B3:%.*]] = ptrtoint ptr [[B]] to i64
+; CHECK-CA510-INTERLEAVE-NEXT:    [[A2:%.*]] = ptrtoint ptr [[A]] to i64
+; CHECK-CA510-INTERLEAVE-NEXT:    [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
+; CHECK-CA510-INTERLEAVE-NEXT:    [[CMP9_NOT:%.*]] = icmp eq i64 [[N]], 0
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[CMP9_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY_PREHEADER:.*]]
+; CHECK-CA510-INTERLEAVE:       [[FOR_BODY_PREHEADER]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 8
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 12, i64 [[TMP1]])
+; CHECK-CA510-INTERLEAVE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], [[TMP2]]
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-CA510-INTERLEAVE:       [[VECTOR_MEMCHECK]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 8
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP6:%.*]] = sub i64 [[DST1]], [[A2]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP6]], [[TMP5]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP4]], 8
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP8:%.*]] = sub i64 [[DST1]], [[B3]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP8]], [[TMP7]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[CONFLICT_RDX]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; CHECK-CA510-INTERLEAVE:       [[VECTOR_PH]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 8
+; CHECK-CA510-INTERLEAVE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP10]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 8
+; CHECK-CA510-INTERLEAVE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK-CA510-INTERLEAVE:       [[VECTOR_BODY]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP13:%.*]] = add i64 [[INDEX]], 0
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP14:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[TMP13]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP15:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i32 0
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP16:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP17:%.*]] = mul i64 [[TMP16]], 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP18:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i64 [[TMP17]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x float>, ptr [[TMP15]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[WIDE_LOAD5:%.*]] = load <vscale x 4 x float>, ptr [[TMP18]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP19:%.*]] = getelementptr inbounds nuw float, ptr [[B]], i64 [[TMP13]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP20:%.*]] = getelementptr inbounds nuw float, ptr [[TMP19]], i32 0
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP21:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP22:%.*]] = mul i64 [[TMP21]], 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP23:%.*]] = getelementptr inbounds nuw float, ptr [[TMP19]], i64 [[TMP22]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[WIDE_LOAD6:%.*]] = load <vscale x 4 x float>, ptr [[TMP20]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[WIDE_LOAD7:%.*]] = load <vscale x 4 x float>, ptr [[TMP23]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP24:%.*]] = fadd fast <vscale x 4 x float> [[WIDE_LOAD6]], [[WIDE_LOAD]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP25:%.*]] = fadd fast <vscale x 4 x float> [[WIDE_LOAD7]], [[WIDE_LOAD5]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP26:%.*]] = getelementptr inbounds nuw float, ptr [[DST]], i64 [[TMP13]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP27:%.*]] = getelementptr inbounds nuw float, ptr [[TMP26]], i32 0
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP28:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP29:%.*]] = mul i64 [[TMP28]], 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP30:%.*]] = getelementptr inbounds nuw float, ptr [[TMP26]], i64 [[TMP29]]
+; CHECK-CA510-INTERLEAVE-NEXT:    store <vscale x 4 x float> [[TMP24]], ptr [[TMP27]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    store <vscale x 4 x float> [[TMP25]], ptr [[TMP30]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP31:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[TMP31]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-CA510-INTERLEAVE:       [[MIDDLE_BLOCK]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-CA510-INTERLEAVE:       [[SCALAR_PH]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-CA510-INTERLEAVE-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK-CA510-INTERLEAVE:       [[FOR_BODY]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[INDVARS_IV]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP32:%.*]] = load float, ptr [[ARRAYIDX]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds nuw float, ptr [[B]], i64 [[INDVARS_IV]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[TMP33:%.*]] = load float, ptr [[ARRAYIDX2]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[ADD:%.*]] = fadd fast float [[TMP33]], [[TMP32]]
+; CHECK-CA510-INTERLEAVE-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds nuw float, ptr [[DST]], i64 [[INDVARS_IV]]
+; CHECK-CA510-INTERLEAVE-NEXT:    store float [[ADD]], ptr [[ARRAYIDX4]], align 4
+; CHECK-CA510-INTERLEAVE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-CA510-INTERLEAVE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; CHECK-CA510-INTERLEAVE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP_LOOPEXIT]], label %[[FOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-CA510-INTERLEAVE:       [[FOR_COND_CLEANUP_LOOPEXIT]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    br label %[[FOR_COND_CLEANUP]]
+; CHECK-CA510-INTERLEAVE:       [[FOR_COND_CLEANUP]]:
+; CHECK-CA510-INTERLEAVE-NEXT:    ret void
+;
+; CHECK-CA520-NOINTERLEAVE-LABEL: define void @sve_add(
+; CHECK-CA520-NOINTERLEAVE-SAME: ptr [[DST:%.*]], ptr [[A:%.*]], ptr [[B:%.*]], i64 [[N:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-CA520-NOINTERLEAVE-NEXT:  [[ENTRY:.*:]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[B3:%.*]] = ptrtoint ptr [[B]] to i64
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[A2:%.*]] = ptrtoint ptr [[A]] to i64
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[DST1:%.*]] = ptrtoint ptr [[DST]] to i64
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[CMP9_NOT:%.*]] = icmp eq i64 [[N]], 0
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br i1 [[CMP9_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY_PREHEADER:.*]]
+; CHECK-CA520-NOINTERLEAVE:       [[FOR_BODY_PREHEADER]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP2:%.*]] = call i64 @llvm.umax.i64(i64 12, i64 [[TMP1]])
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], [[TMP2]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_MEMCHECK:.*]]
+; CHECK-CA520-NOINTERLEAVE:       [[VECTOR_MEMCHECK]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP5:%.*]] = mul i64 [[TMP4]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP6:%.*]] = sub i64 [[DST1]], [[A2]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP6]], [[TMP5]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP4]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP8:%.*]] = sub i64 [[DST1]], [[B3]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP8]], [[TMP7]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br i1 [[CONFLICT_RDX]], label %[[SCALAR_PH]], label %[[VECTOR_PH:.*]]
+; CHECK-CA520-NOINTERLEAVE:       [[VECTOR_PH]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP9:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP10:%.*]] = mul i64 [[TMP9]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP10]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP11:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP12:%.*]] = mul i64 [[TMP11]], 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK-CA520-NOINTERLEAVE:       [[VECTOR_BODY]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP13:%.*]] = add i64 [[INDEX]], 0
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP14:%.*]] = getelementptr inbounds nuw float, ptr [[A]], i64 [[TMP13]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP15:%.*]] = getelementptr inbounds nuw float, ptr [[TMP14]], i32 0
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 4 x float>, ptr [[TMP15]], align 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP16:%.*]] = getelementptr inbounds nuw float, ptr [[B]], i64 [[TMP13]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP17:%.*]] = getelementptr inbounds nuw float, ptr [[TMP16]], i32 0
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[WIDE_LOAD5:%.*]] = load <vscale x 4 x float>, ptr [[TMP17]], align 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP18:%.*]] = fadd fast <vscale x 4 x float> [[WIDE_LOAD5]], [[WIDE_LOAD]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP19:%.*]] = getelementptr inbounds nuw float, ptr [[DST]], i64 [[TMP13]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP20:%.*]] = getelementptr inbounds nuw float, ptr [[TMP19]], i32 0
+; CHECK-CA520-NOINTERLEAVE-NEXT:    store <vscale x 4 x float> [[TMP18]], ptr [[TMP20]], align 4
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br i1 [[TMP21]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK-CA520-NOINTERLEAVE:       [[MIDDLE_BLOCK]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP_LOOPEXIT:.*]], label %[[SCALAR_PH]]
+; CHECK-CA520-NOINTERLEAVE:       [[SCALAR_PH]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[FOR_BODY_PREHEADER]] ], [ 0, %[[VECTOR_MEMCHECK]] ]
+; CHECK-CA520-NOINTERLEAVE-NEXT:    br label %[[FOR_BODY:.*]]
+; CHECK-CA520-NOINTERLEAVE:       [[FOR_BODY]]:
+; CHECK-CA520-NOINTERLEAVE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ], [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ]
+; CHECK-CA520-NOINTERLEAVE...
[truncated]

davemgreen · 2025-03-21T06:58:36Z

I would expect In-order cores should like unrolling, as it enables more hiding of latency hazards. (There are always times based on the trip count of the loop or the overheads that it ends up making things worse, but I would expect at least some level of interleaving to usually be useful overall).

It looks from the example code that the addressing modes in the loops is not doing very well. They are usually calculated in LSR. Could they do better, and then get the benefit of interleaving without the cost of the inefficient addressing mode calculations?

davemgreen · 2025-03-21T06:59:15Z

Also - I think this controls Neon too, and Neon will prefer interleaving at least a bit to make use of LDP/STP.

davemgreen · 2025-03-25T15:27:42Z

The other alternative might be to prefer fixed width to scalable when the costs in the vectorizer are equal, if that is more beneficial on these cores. It is controlled via the FeatureUseFixedOverScalableIfEqualCost feature.

Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, llvm#1, mul vl] ldr z3, [x13, llvm#1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, llvm#2 str z1, [x12, llvm#1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient. Change-Id: Ie1e862f6a1db851182a95534b3b987feb670d7ca

davemgreen

LGTM, can you change the title to something like "[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520". It is only when the scores are equal and the vectorizer has no reason to pick one vs the other that this will cause the vectorizer to pick fixed-width.

llvm-ci · 2025-04-04T13:19:49Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/14312

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

llvm-ci · 2025-04-04T13:23:22Z

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/27853

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/build/buildbot/premerge-monolithic-linux/build/bin/opt < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /build/buildbot/premerge-monolithic-linux/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

llvm-ci · 2025-04-04T13:23:40Z

LLVM Buildbot has detected a new failure on builder ml-opt-rel-x86-64 running on ml-opt-rel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/16134

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-rel-x86-64-b1/build/bin/opt < /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /b/ml-opt-rel-x86-64-b1/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /b/ml-opt-rel-x86-64-b1/build/bin/FileCheck /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /b/ml-opt-rel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

llvm-ci · 2025-04-04T13:23:45Z

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/16375

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-dev-x86-64-b1/build/bin/opt < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /b/ml-opt-dev-x86-64-b1/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

… A510 and A520" (#134382) Reverts #132246

llvm-ci · 2025-04-04T13:38:20Z

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/23895

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/1/llvm-x86_64-debian-dylib/build/bin/opt < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

nasherm · 2025-04-04T13:41:35Z

Had to revert due to buildbot failures. Investigating

…ualCost for A510 and A520" (#134382) Reverts llvm/llvm-project#132246

llvm-ci · 2025-04-04T13:48:34Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/17694

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt < /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: �[0m�[0;1;31merror: �[0m�[1mCHECK-CA510-NEXT: expected string not found in input
�[0m; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
�[0;1;32m                    ^
�[0m�[1m<stdin>:32:67: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
�[0;1;32m                                                                  ^
�[0m�[1m<stdin>:32:67: �[0m�[0;1;30mnote: �[0m�[1mwith "INDEX" equal to "%index"
�[0m %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
�[0;1;32m                                                                  ^
�[0m�[1m<stdin>:63:8: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
�[0;1;32m       ^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m; ModuleID = '<stdin>' �[0m
�[0;1;30m            2: �[0m�[1m�[0;1;46msource_filename = "<stdin>" �[0m
�[0;1;30m            3: �[0m�[1m�[0;1;46mtarget datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" �[0m
�[0;1;30m            4: �[0m�[1m�[0;1;46mtarget triple = "aarch64-unknown-none-elf" �[0m
�[0;1;30m            5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m            6: �[0m�[1m�[0;1;46m�[0mdefine void @sve_add(ptr %dst, ptr %a, ptr %b, i64 %n) #0 {�[0;1;46m �[0m
�[0;1;32mlabel:6'0      ^~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32mlabel:6'1      ^~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32msame:7'0                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32msame:7'1                                ^~~~                                captured var "DST"
�[0m�[0;1;32msame:7'2                                          ^~                        captured var "A"
�[0m�[0;1;32msame:7'3                                                  ^~                captured var "B"
�[0m�[0;1;32msame:7'4                                                          ^~        captured var "N"
�[0m�[0;1;32msame:7'5                                                               ^    captured var "ATTR0"
�[0m�[0;1;30m            7: �[0m�[1m�[0;1;46m�[0mentry:�[0;1;46m �[0m
�[0;1;32mnext:8'0       ^~~~~~
�[0m�[0;1;32mnext:8'1       ^~~~~~  captured var "ENTRY"
�[0m�[0;1;30m            8: �[0m�[1m�[0;1;46m �[0m%b3 = ptrtoint ptr %b to i64�[0;1;46m �[0m
�[0;1;32mnext:9'0        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
�[0m�[0;1;32mnext:9'1                                      with "B" equal to "%b"
�[0m�[0;1;32mnext:9'2        ^~~                           captured var "B3"
�[0m�[0;1;30m            9: �[0m�[1m�[0;1;46m �[0m%a2 = ptrtoint ptr %a to i64�[0;1;46m �[0m
...

llvm-ci · 2025-04-04T13:54:30Z

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/16254

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/b/ml-opt-devrel-x86-64-b1/build/bin/opt < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
+ /b/ml-opt-devrel-x86-64-b1/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

llvm-ci · 2025-04-04T14:09:06Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/5218

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/opt < /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
next:30'1                                                                         with "INDEX" equal to "%index"
          33:  %2 = getelementptr inbounds nuw float, ptr %a, i64 %index 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          34:  %3 = getelementptr inbounds nuw float, ptr %2, i32 0 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          35:  %4 = getelementptr inbounds nuw float, ptr %2, i32 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          36:  %wide.load = load <4 x float>, ptr %3, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %wide.load5 = load <4 x float>, ptr %4, align 4 
next:30'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           .
...

llvm-ci · 2025-04-04T15:09:15Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-bootstrap-asan running on sanitizer-buildbot2 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/52/builds/7327

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89952 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (70844 of 89952)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
Step 11 (stage2/asan check) failure: stage2/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 89952 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (70844 of 89952)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found
Step 13 (stage3/asan check) failure: stage3/asan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using lld-link: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:520: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 86896 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 
FAIL: LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (70869 of 86896)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/opt < /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S | /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510 # RUN: at line 2
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/opt -mtriple=aarch64-none-elf -mcpu=cortex-a510 -mattr=+sve -passes=loop-vectorize -S
+ /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm_build2_asan/bin/FileCheck /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll --check-prefix=CHECK-CA510
/home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll:30:21: error: CHECK-CA510-NEXT: expected string not found in input
; CHECK-CA510-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 0
                    ^
<stdin>:32:67: note: scanning from here
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:32:67: note: with "INDEX" equal to "%index"
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
                                                                  ^
<stdin>:63:8: note: possible intended match here
 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ %bc.resume.val, %scalar.ph ]
       ^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux-bootstrap-asan/build/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           .
           .
           .
          27:  %n.mod.vf = urem i64 %n, 8 
          28:  %n.vec = sub i64 %n, %n.mod.vf 
          29:  br label %vector.body 
          30:  
          31: vector.body: ; preds = %vector.body, %vector.ph 
          32:  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] 
next:30'0                                                                       X error: no match found

…d A520 (llvm#132246) Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, llvm#1, mul vl] ldr z3, [x13, llvm#1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, llvm#2 str z1, [x12, llvm#1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient. Change-Id: Iccf8f9ba662f03d77d4890f684a8dc7b1a3e3b3b

… A510/A520 (#134606) Recommit. This work was done by #132246 but failed buildbots due to the test introduced needing updates Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, #2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, #8 ldp q1, q2, [x10, #-16] add x10, x10, #32 add x11, x11, #32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, #32 ``` Which is more efficient.

With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul llvm#2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, llvm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, llvm#32 add x12, x12, llvm#32 stp q0, q1, [x10, #-16] add x10, x10, llvm#32 ... ``` This patch also moves FeatureUseFixedOverScalableIfEqualCost for A510 and A520 from the CPU features to the tune features.

…152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, #8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, #32 add x12, x12, #32 stp q0, q1, [x10, #-16] add x10, x10, #32 ... ```

llvmbot added backend:AArch64 llvm:transforms labels Mar 20, 2025

nasherm requested a review from sdesmalen-arm March 20, 2025 16:01

nasherm requested review from davemgreen and john-brawn-arm March 20, 2025 16:01

nasherm force-pushed the nashe/a520-sve-codegen branch from c16f09c to 9df06bb Compare April 2, 2025 13:06

nasherm changed the title ~~[AArch64][SVE] Reduce MaxInterleaveFactor for A510 and A520~~ [AArch64][SVE] Used fixed width vectorization for A510 and A520 Apr 2, 2025

davemgreen approved these changes Apr 3, 2025

View reviewed changes

nasherm changed the title ~~[AArch64][SVE] Used fixed width vectorization for A510 and A520~~ [AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520 Apr 3, 2025

nasherm merged commit d2bcc11 into llvm:main Apr 4, 2025
11 checks passed

nasherm mentioned this pull request Apr 4, 2025

Revert "[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520" #134382

Merged

nasherm added a commit that referenced this pull request Apr 4, 2025

Revert "[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for…

846000c

… A510 and A520" (#134382) Reverts #132246

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 4, 2025

Automerge: Revert "[AArch64][SVE] Use FeatureUseFixedOverScalableIfEq…

7db80dd

…ualCost for A510 and A520" (#134382) Reverts llvm/llvm-project#132246

nasherm mentioned this pull request Apr 7, 2025

Recommit [AArch64][SVE]Use FeatureUseFixedOverScalableIfEqualCost for A510/A520 #134606

Merged

stuij mentioned this pull request Aug 5, 2025

[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A320 #152156

Merged

[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520 #132246

[AArch64][SVE] Use FeatureUseFixedOverScalableIfEqualCost for A510 and A520 #132246

Uh oh!

Conversation

nasherm commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Mar 21, 2025

Uh oh!

davemgreen commented Mar 21, 2025

Uh oh!

davemgreen commented Mar 25, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

nasherm commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

llvm-ci commented Apr 4, 2025

Uh oh!

Uh oh!

nasherm commented Mar 20, 2025 •

edited

Loading

llvmbot commented Mar 20, 2025 •

edited

Loading