[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit #171589

RKSimon · 2025-12-10T10:12:05Z

Currently we only permit i256 CTTZ/CTLZ AVX512 lowering when the source is loadable as GPR->FPU transition costs would outweigh the vectorization benefit.

This patch checks for other cases where the source can avoid the GPR - a mayFoldToVector helper checks for a bitcast originally from a vector type, as well as constant values and the original mayFoldLoad check.

There will be other cases for the mayFoldToVector helper, but I've just used this for CTTZ/CTLZ initially.

…ed from the vector unit Currently we only permit i256 CTTZ/CTLZ AVX512 lowering when the source is loadable as GPR->FPU transition costs would outweigh the vectorization benefit. This patch checks for other cases where the source can avoid the GPR - a mayFoldToVector helper checks for a bitcast originally from a vector type, as well as constant values and the original mayFoldLoad check. There will be other cases for the mayFoldToVector helper, but I've just used this for CTTZ/CTLZ initially.

llvmbot · 2025-12-10T10:14:04Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Currently we only permit i256 CTTZ/CTLZ AVX512 lowering when the source is loadable as GPR->FPU transition costs would outweigh the vectorization benefit.

This patch checks for other cases where the source can avoid the GPR - a mayFoldToVector helper checks for a bitcast originally from a vector type, as well as constant values and the original mayFoldLoad check.

There will be other cases for the mayFoldToVector helper, but I've just used this for CTTZ/CTLZ initially.

Full diff: https://github.com/llvm/llvm-project/pull/171589.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+10-1)
(modified) llvm/test/CodeGen/X86/bitcnt-big-integer.ll (+104-228)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index fbd875a93fd4a..b4ad7465d612e 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -2846,6 +2846,15 @@ bool X86::mayFoldIntoZeroExtend(SDValue Op) {
   return false;
 }
 
+// Return true if its cheap to bitcast this to a vector type.
+static bool mayFoldToVector(SDValue Op, const X86Subtarget &Subtarget) {
+  if (peekThroughBitcasts(Op).getValueType().isVector())
+    return true;
+  if (isa<ConstantSDNode>(Op) || isa<ConstantFPSDNode>(Op))
+    return true;
+  return X86::mayFoldLoad(Op, Subtarget);
+}
+
 static bool isLogicOp(unsigned Opcode) {
   // TODO: Add support for X86ISD::FAND/FOR/FXOR/FANDN with test coverage.
   return ISD::isBitwiseLogicOp(Opcode) || X86ISD::ANDNP == Opcode;
@@ -33958,7 +33967,7 @@ void X86TargetLowering::ReplaceNodeResults(SDNode *N,
     EVT VT = N->getValueType(0);
     assert(Subtarget.hasCDI() && "AVX512CD required");
     assert((VT == MVT::i256 || VT == MVT::i512) && "Unexpected VT!");
-    if (VT == MVT::i256 && !X86::mayFoldLoad(N0, Subtarget))
+    if (VT == MVT::i256 && !mayFoldToVector(N0, Subtarget))
       return;
 
     unsigned SizeInBits = VT.getSizeInBits();
diff --git a/llvm/test/CodeGen/X86/bitcnt-big-integer.ll b/llvm/test/CodeGen/X86/bitcnt-big-integer.ll
index 749b3ddc96d0d..06ccbf4daa1e8 100644
--- a/llvm/test/CodeGen/X86/bitcnt-big-integer.ll
+++ b/llvm/test/CodeGen/X86/bitcnt-big-integer.ll
@@ -1567,72 +1567,38 @@ define i32 @vector_ctlz_i256(<8 x i32> %v0) nounwind {
 ;
 ; AVX512F-LABEL: vector_ctlz_i256:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq %xmm0, %rax
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512F-NEXT:    vmovq %xmm0, %rdx
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512F-NEXT:    lzcntq %rsi, %rdi
-; AVX512F-NEXT:    lzcntq %rdx, %r8
-; AVX512F-NEXT:    addl $64, %r8d
-; AVX512F-NEXT:    testq %rsi, %rsi
-; AVX512F-NEXT:    cmovnel %edi, %r8d
-; AVX512F-NEXT:    lzcntq %rcx, %rdi
-; AVX512F-NEXT:    lzcntq %rax, %rax
-; AVX512F-NEXT:    addl $64, %eax
-; AVX512F-NEXT:    testq %rcx, %rcx
-; AVX512F-NEXT:    cmovnel %edi, %eax
-; AVX512F-NEXT:    subl $-128, %eax
-; AVX512F-NEXT:    orq %rsi, %rdx
-; AVX512F-NEXT:    cmovnel %r8d, %eax
-; AVX512F-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [256,256,256,256]
+; AVX512F-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512F-NEXT:    vplzcntq %zmm0, %zmm2
+; AVX512F-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
+; AVX512F-NEXT:    vptestmq %zmm0, %zmm0, %k0
+; AVX512F-NEXT:    kshiftlw $12, %k0, %k0
+; AVX512F-NEXT:    kshiftrw $12, %k0, %k1
+; AVX512F-NEXT:    vpcompressq %zmm2, %zmm1 {%k1}
+; AVX512F-NEXT:    vmovd %xmm1, %eax
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512VL-LABEL: vector_ctlz_i256:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512VL-NEXT:    vmovq %xmm0, %rax
-; AVX512VL-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512VL-NEXT:    vmovq %xmm0, %rdx
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512VL-NEXT:    lzcntq %rsi, %rdi
-; AVX512VL-NEXT:    lzcntq %rdx, %r8
-; AVX512VL-NEXT:    addl $64, %r8d
-; AVX512VL-NEXT:    testq %rsi, %rsi
-; AVX512VL-NEXT:    cmovnel %edi, %r8d
-; AVX512VL-NEXT:    lzcntq %rcx, %rdi
-; AVX512VL-NEXT:    lzcntq %rax, %rax
-; AVX512VL-NEXT:    addl $64, %eax
-; AVX512VL-NEXT:    testq %rcx, %rcx
-; AVX512VL-NEXT:    cmovnel %edi, %eax
-; AVX512VL-NEXT:    subl $-128, %eax
-; AVX512VL-NEXT:    orq %rsi, %rdx
-; AVX512VL-NEXT:    cmovnel %r8d, %eax
-; AVX512VL-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512VL-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512VL-NEXT:    vplzcntq %ymm0, %ymm1
+; AVX512VL-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX512VL-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512VL-NEXT:    vpbroadcastq {{.*#+}} ymm0 = [256,256,256,256]
+; AVX512VL-NEXT:    vpcompressq %ymm1, %ymm0 {%k1}
+; AVX512VL-NEXT:    vmovd %xmm0, %eax
 ; AVX512VL-NEXT:    vzeroupper
 ; AVX512VL-NEXT:    retq
 ;
 ; AVX512POPCNT-LABEL: vector_ctlz_i256:
 ; AVX512POPCNT:       # %bb.0:
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rax
-; AVX512POPCNT-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rdx
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512POPCNT-NEXT:    lzcntq %rsi, %rdi
-; AVX512POPCNT-NEXT:    lzcntq %rdx, %r8
-; AVX512POPCNT-NEXT:    addl $64, %r8d
-; AVX512POPCNT-NEXT:    testq %rsi, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %edi, %r8d
-; AVX512POPCNT-NEXT:    lzcntq %rcx, %rdi
-; AVX512POPCNT-NEXT:    lzcntq %rax, %rax
-; AVX512POPCNT-NEXT:    addl $64, %eax
-; AVX512POPCNT-NEXT:    testq %rcx, %rcx
-; AVX512POPCNT-NEXT:    cmovnel %edi, %eax
-; AVX512POPCNT-NEXT:    subl $-128, %eax
-; AVX512POPCNT-NEXT:    orq %rsi, %rdx
-; AVX512POPCNT-NEXT:    cmovnel %r8d, %eax
-; AVX512POPCNT-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512POPCNT-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512POPCNT-NEXT:    vplzcntq %ymm0, %ymm1
+; AVX512POPCNT-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512POPCNT-NEXT:    vpbroadcastq {{.*#+}} ymm0 = [256,256,256,256]
+; AVX512POPCNT-NEXT:    vpcompressq %ymm1, %ymm0 {%k1}
+; AVX512POPCNT-NEXT:    vmovd %xmm0, %eax
 ; AVX512POPCNT-NEXT:    vzeroupper
 ; AVX512POPCNT-NEXT:    retq
   %a0 = bitcast <8 x i32> %v0 to i256
@@ -3246,72 +3212,35 @@ define i32 @vector_ctlz_undef_i256(<8 x i32> %v0) nounwind {
 ;
 ; AVX512F-LABEL: vector_ctlz_undef_i256:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vmovq %xmm0, %rax
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512F-NEXT:    vmovq %xmm0, %rdx
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512F-NEXT:    lzcntq %rsi, %rdi
-; AVX512F-NEXT:    lzcntq %rdx, %r8
-; AVX512F-NEXT:    addl $64, %r8d
-; AVX512F-NEXT:    testq %rsi, %rsi
-; AVX512F-NEXT:    cmovnel %edi, %r8d
-; AVX512F-NEXT:    lzcntq %rcx, %rdi
-; AVX512F-NEXT:    lzcntq %rax, %rax
-; AVX512F-NEXT:    addl $64, %eax
-; AVX512F-NEXT:    testq %rcx, %rcx
-; AVX512F-NEXT:    cmovnel %edi, %eax
-; AVX512F-NEXT:    subl $-128, %eax
-; AVX512F-NEXT:    orq %rsi, %rdx
-; AVX512F-NEXT:    cmovnel %r8d, %eax
-; AVX512F-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512F-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512F-NEXT:    vplzcntq %zmm0, %zmm1
+; AVX512F-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX512F-NEXT:    vptestmq %zmm0, %zmm0, %k0
+; AVX512F-NEXT:    kshiftlw $12, %k0, %k0
+; AVX512F-NEXT:    kshiftrw $12, %k0, %k1
+; AVX512F-NEXT:    vpcompressq %zmm1, %zmm0 {%k1} {z}
+; AVX512F-NEXT:    vmovd %xmm0, %eax
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512VL-LABEL: vector_ctlz_undef_i256:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512VL-NEXT:    vmovq %xmm0, %rax
-; AVX512VL-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512VL-NEXT:    vmovq %xmm0, %rdx
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512VL-NEXT:    lzcntq %rsi, %rdi
-; AVX512VL-NEXT:    lzcntq %rdx, %r8
-; AVX512VL-NEXT:    addl $64, %r8d
-; AVX512VL-NEXT:    testq %rsi, %rsi
-; AVX512VL-NEXT:    cmovnel %edi, %r8d
-; AVX512VL-NEXT:    lzcntq %rcx, %rdi
-; AVX512VL-NEXT:    lzcntq %rax, %rax
-; AVX512VL-NEXT:    addl $64, %eax
-; AVX512VL-NEXT:    testq %rcx, %rcx
-; AVX512VL-NEXT:    cmovnel %edi, %eax
-; AVX512VL-NEXT:    subl $-128, %eax
-; AVX512VL-NEXT:    orq %rsi, %rdx
-; AVX512VL-NEXT:    cmovnel %r8d, %eax
-; AVX512VL-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512VL-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512VL-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512VL-NEXT:    vplzcntq %ymm0, %ymm0
+; AVX512VL-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX512VL-NEXT:    vpcompressq %ymm0, %ymm0 {%k1} {z}
+; AVX512VL-NEXT:    vmovd %xmm0, %eax
 ; AVX512VL-NEXT:    vzeroupper
 ; AVX512VL-NEXT:    retq
 ;
 ; AVX512POPCNT-LABEL: vector_ctlz_undef_i256:
 ; AVX512POPCNT:       # %bb.0:
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rcx
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rax
-; AVX512POPCNT-NEXT:    vextracti128 $1, %ymm0, %xmm0
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rdx
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rsi
-; AVX512POPCNT-NEXT:    lzcntq %rsi, %rdi
-; AVX512POPCNT-NEXT:    lzcntq %rdx, %r8
-; AVX512POPCNT-NEXT:    addl $64, %r8d
-; AVX512POPCNT-NEXT:    testq %rsi, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %edi, %r8d
-; AVX512POPCNT-NEXT:    lzcntq %rcx, %rdi
-; AVX512POPCNT-NEXT:    lzcntq %rax, %rax
-; AVX512POPCNT-NEXT:    addl $64, %eax
-; AVX512POPCNT-NEXT:    testq %rcx, %rcx
-; AVX512POPCNT-NEXT:    cmovnel %edi, %eax
-; AVX512POPCNT-NEXT:    subl $-128, %eax
-; AVX512POPCNT-NEXT:    orq %rsi, %rdx
-; AVX512POPCNT-NEXT:    cmovnel %r8d, %eax
-; AVX512POPCNT-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512POPCNT-NEXT:    vpermq {{.*#+}} ymm0 = ymm0[3,2,1,0]
+; AVX512POPCNT-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512POPCNT-NEXT:    vplzcntq %ymm0, %ymm0
+; AVX512POPCNT-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
+; AVX512POPCNT-NEXT:    vpcompressq %ymm0, %ymm0 {%k1} {z}
+; AVX512POPCNT-NEXT:    vmovd %xmm0, %eax
 ; AVX512POPCNT-NEXT:    vzeroupper
 ; AVX512POPCNT-NEXT:    retq
   %a0 = bitcast <8 x i32> %v0 to i256
@@ -4887,72 +4816,47 @@ define i32 @vector_cttz_i256(<8 x i32> %v0) nounwind {
 ;
 ; AVX512F-LABEL: vector_cttz_i256:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512F-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512F-NEXT:    vmovq %xmm1, %rcx
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512F-NEXT:    vmovq %xmm0, %rsi
-; AVX512F-NEXT:    tzcntq %rsi, %rdi
-; AVX512F-NEXT:    tzcntq %rdx, %r8
-; AVX512F-NEXT:    addl $64, %r8d
-; AVX512F-NEXT:    testq %rsi, %rsi
-; AVX512F-NEXT:    cmovnel %edi, %r8d
-; AVX512F-NEXT:    tzcntq %rcx, %rdi
-; AVX512F-NEXT:    tzcntq %rax, %rax
-; AVX512F-NEXT:    addl $64, %eax
-; AVX512F-NEXT:    testq %rcx, %rcx
-; AVX512F-NEXT:    cmovnel %edi, %eax
-; AVX512F-NEXT:    subl $-128, %eax
-; AVX512F-NEXT:    orq %rdx, %rsi
-; AVX512F-NEXT:    cmovnel %r8d, %eax
-; AVX512F-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512F-NEXT:    vpbroadcastq {{.*#+}} ymm1 = [256,256,256,256]
+; AVX512F-NEXT:    vpcmpeqd %ymm2, %ymm2, %ymm2
+; AVX512F-NEXT:    vpaddq %ymm2, %ymm0, %ymm2
+; AVX512F-NEXT:    vpandn %ymm2, %ymm0, %ymm2
+; AVX512F-NEXT:    vplzcntq %zmm2, %zmm2
+; AVX512F-NEXT:    vmovdqa {{.*#+}} ymm3 = [64,128,192,256]
+; AVX512F-NEXT:    vpsubq %ymm2, %ymm3, %ymm2
+; AVX512F-NEXT:    vptestmq %zmm0, %zmm0, %k0
+; AVX512F-NEXT:    kshiftlw $12, %k0, %k0
+; AVX512F-NEXT:    kshiftrw $12, %k0, %k1
+; AVX512F-NEXT:    vpcompressq %zmm2, %zmm1 {%k1}
+; AVX512F-NEXT:    vmovd %xmm1, %eax
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512VL-LABEL: vector_cttz_i256:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512VL-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512VL-NEXT:    vmovq %xmm1, %rcx
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512VL-NEXT:    vmovq %xmm0, %rsi
-; AVX512VL-NEXT:    tzcntq %rsi, %rdi
-; AVX512VL-NEXT:    tzcntq %rdx, %r8
-; AVX512VL-NEXT:    addl $64, %r8d
-; AVX512VL-NEXT:    testq %rsi, %rsi
-; AVX512VL-NEXT:    cmovnel %edi, %r8d
-; AVX512VL-NEXT:    tzcntq %rcx, %rdi
-; AVX512VL-NEXT:    tzcntq %rax, %rax
-; AVX512VL-NEXT:    addl $64, %eax
-; AVX512VL-NEXT:    testq %rcx, %rcx
-; AVX512VL-NEXT:    cmovnel %edi, %eax
-; AVX512VL-NEXT:    subl $-128, %eax
-; AVX512VL-NEXT:    orq %rdx, %rsi
-; AVX512VL-NEXT:    cmovnel %r8d, %eax
-; AVX512VL-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512VL-NEXT:    vpcmpeqd %ymm1, %ymm1, %ymm1
+; AVX512VL-NEXT:    vpaddq %ymm1, %ymm0, %ymm1
+; AVX512VL-NEXT:    vpandn %ymm1, %ymm0, %ymm1
+; AVX512VL-NEXT:    vplzcntq %ymm1, %ymm1
+; AVX512VL-NEXT:    vmovdqa {{.*#+}} ymm2 = [64,128,192,256]
+; AVX512VL-NEXT:    vpsubq %ymm1, %ymm2, %ymm1
+; AVX512VL-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512VL-NEXT:    vpbroadcastq {{.*#+}} ymm0 = [256,256,256,256]
+; AVX512VL-NEXT:    vpcompressq %ymm1, %ymm0 {%k1}
+; AVX512VL-NEXT:    vmovd %xmm0, %eax
 ; AVX512VL-NEXT:    vzeroupper
 ; AVX512VL-NEXT:    retq
 ;
 ; AVX512POPCNT-LABEL: vector_cttz_i256:
 ; AVX512POPCNT:       # %bb.0:
-; AVX512POPCNT-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512POPCNT-NEXT:    vmovq %xmm1, %rcx
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rsi
-; AVX512POPCNT-NEXT:    tzcntq %rsi, %rdi
-; AVX512POPCNT-NEXT:    tzcntq %rdx, %r8
-; AVX512POPCNT-NEXT:    addl $64, %r8d
-; AVX512POPCNT-NEXT:    testq %rsi, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %edi, %r8d
-; AVX512POPCNT-NEXT:    tzcntq %rcx, %rdi
-; AVX512POPCNT-NEXT:    tzcntq %rax, %rax
-; AVX512POPCNT-NEXT:    addl $64, %eax
-; AVX512POPCNT-NEXT:    testq %rcx, %rcx
-; AVX512POPCNT-NEXT:    cmovnel %edi, %eax
-; AVX512POPCNT-NEXT:    subl $-128, %eax
-; AVX512POPCNT-NEXT:    orq %rdx, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %r8d, %eax
-; AVX512POPCNT-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512POPCNT-NEXT:    vpcmpeqd %ymm1, %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vpaddq %ymm1, %ymm0, %ymm1
+; AVX512POPCNT-NEXT:    vpandn %ymm1, %ymm0, %ymm1
+; AVX512POPCNT-NEXT:    vpopcntq %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512POPCNT-NEXT:    vpbroadcastq {{.*#+}} ymm0 = [256,256,256,256]
+; AVX512POPCNT-NEXT:    vpcompressq %ymm1, %ymm0 {%k1}
+; AVX512POPCNT-NEXT:    vmovd %xmm0, %eax
 ; AVX512POPCNT-NEXT:    vzeroupper
 ; AVX512POPCNT-NEXT:    retq
   %a0 = bitcast <8 x i32> %v0 to i256
@@ -6484,72 +6388,44 @@ define i32 @vector_cttz_undef_i256(<8 x i32> %v0) nounwind {
 ;
 ; AVX512F-LABEL: vector_cttz_undef_i256:
 ; AVX512F:       # %bb.0:
-; AVX512F-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512F-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512F-NEXT:    vmovq %xmm1, %rcx
-; AVX512F-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512F-NEXT:    vmovq %xmm0, %rsi
-; AVX512F-NEXT:    tzcntq %rsi, %rdi
-; AVX512F-NEXT:    tzcntq %rdx, %r8
-; AVX512F-NEXT:    addl $64, %r8d
-; AVX512F-NEXT:    testq %rsi, %rsi
-; AVX512F-NEXT:    cmovnel %edi, %r8d
-; AVX512F-NEXT:    tzcntq %rcx, %rdi
-; AVX512F-NEXT:    tzcntq %rax, %rax
-; AVX512F-NEXT:    addl $64, %eax
-; AVX512F-NEXT:    testq %rcx, %rcx
-; AVX512F-NEXT:    cmovnel %edi, %eax
-; AVX512F-NEXT:    subl $-128, %eax
-; AVX512F-NEXT:    orq %rdx, %rsi
-; AVX512F-NEXT:    cmovnel %r8d, %eax
-; AVX512F-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512F-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512F-NEXT:    vpcmpeqd %ymm1, %ymm1, %ymm1
+; AVX512F-NEXT:    vpaddq %ymm1, %ymm0, %ymm1
+; AVX512F-NEXT:    vpandn %ymm1, %ymm0, %ymm1
+; AVX512F-NEXT:    vplzcntq %zmm1, %zmm1
+; AVX512F-NEXT:    vmovdqa {{.*#+}} ymm2 = [64,128,192,256]
+; AVX512F-NEXT:    vpsubq %ymm1, %ymm2, %ymm1
+; AVX512F-NEXT:    vptestmq %zmm0, %zmm0, %k0
+; AVX512F-NEXT:    kshiftlw $12, %k0, %k0
+; AVX512F-NEXT:    kshiftrw $12, %k0, %k1
+; AVX512F-NEXT:    vpcompressq %zmm1, %zmm0 {%k1} {z}
+; AVX512F-NEXT:    vmovd %xmm0, %eax
 ; AVX512F-NEXT:    retq
 ;
 ; AVX512VL-LABEL: vector_cttz_undef_i256:
 ; AVX512VL:       # %bb.0:
-; AVX512VL-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512VL-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512VL-NEXT:    vmovq %xmm1, %rcx
-; AVX512VL-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512VL-NEXT:    vmovq %xmm0, %rsi
-; AVX512VL-NEXT:    tzcntq %rsi, %rdi
-; AVX512VL-NEXT:    tzcntq %rdx, %r8
-; AVX512VL-NEXT:    addl $64, %r8d
-; AVX512VL-NEXT:    testq %rsi, %rsi
-; AVX512VL-NEXT:    cmovnel %edi, %r8d
-; AVX512VL-NEXT:    tzcntq %rcx, %rdi
-; AVX512VL-NEXT:    tzcntq %rax, %rax
-; AVX512VL-NEXT:    addl $64, %eax
-; AVX512VL-NEXT:    testq %rcx, %rcx
-; AVX512VL-NEXT:    cmovnel %edi, %eax
-; AVX512VL-NEXT:    subl $-128, %eax
-; AVX512VL-NEXT:    orq %rdx, %rsi
-; AVX512VL-NEXT:    cmovnel %r8d, %eax
-; AVX512VL-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512VL-NEXT:    vpcmpeqd %ymm1, %ymm1, %ymm1
+; AVX512VL-NEXT:    vpaddq %ymm1, %ymm0, %ymm1
+; AVX512VL-NEXT:    vpandn %ymm1, %ymm0, %ymm1
+; AVX512VL-NEXT:    vmovdqa {{.*#+}} ymm2 = [64,128,192,256]
+; AVX512VL-NEXT:    vplzcntq %ymm1, %ymm1
+; AVX512VL-NEXT:    vpsubq %ymm1, %ymm2, %ymm1
+; AVX512VL-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512VL-NEXT:    vpcompressq %ymm1, %ymm0 {%k1} {z}
+; AVX512VL-NEXT:    vmovd %xmm0, %eax
 ; AVX512VL-NEXT:    vzeroupper
 ; AVX512VL-NEXT:    retq
 ;
 ; AVX512POPCNT-LABEL: vector_cttz_undef_i256:
 ; AVX512POPCNT:       # %bb.0:
-; AVX512POPCNT-NEXT:    vextracti128 $1, %ymm0, %xmm1
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm1, %rax
-; AVX512POPCNT-NEXT:    vmovq %xmm1, %rcx
-; AVX512POPCNT-NEXT:    vpextrq $1, %xmm0, %rdx
-; AVX512POPCNT-NEXT:    vmovq %xmm0, %rsi
-; AVX512POPCNT-NEXT:    tzcntq %rsi, %rdi
-; AVX512POPCNT-NEXT:    tzcntq %rdx, %r8
-; AVX512POPCNT-NEXT:    addl $64, %r8d
-; AVX512POPCNT-NEXT:    testq %rsi, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %edi, %r8d
-; AVX512POPCNT-NEXT:    tzcntq %rcx, %rdi
-; AVX512POPCNT-NEXT:    tzcntq %rax, %rax
-; AVX512POPCNT-NEXT:    addl $64, %eax
-; AVX512POPCNT-NEXT:    testq %rcx, %rcx
-; AVX512POPCNT-NEXT:    cmovnel %edi, %eax
-; AVX512POPCNT-NEXT:    subl $-128, %eax
-; AVX512POPCNT-NEXT:    orq %rdx, %rsi
-; AVX512POPCNT-NEXT:    cmovnel %r8d, %eax
-; AVX512POPCNT-NEXT:    # kill: def $eax killed $eax killed $rax
+; AVX512POPCNT-NEXT:    vpcmpeqd %ymm1, %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vpaddq %ymm1, %ymm0, %ymm1
+; AVX512POPCNT-NEXT:    vpandn %ymm1, %ymm0, %ymm1
+; AVX512POPCNT-NEXT:    vpopcntq %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm1
+; AVX512POPCNT-NEXT:    vptestmq %ymm0, %ymm0, %k1
+; AVX512POPCNT-NEXT:    vpcompressq %ymm1, %ymm0 {%k1} {z}
+; AVX512POPCNT-NEXT:    vmovd %xmm0, %eax
 ; AVX512POPCNT-NEXT:    vzeroupper
 ; AVX512POPCNT-NEXT:    retq
   %a0 = bitcast <8 x i32> %v0 to i256

phoebewang · 2025-12-10T10:41:25Z

llvm/lib/Target/X86/X86ISelLowering.cpp

 }

+// Return true if its cheap to bitcast this to a vector type.
+static bool mayFoldToVector(SDValue Op, const X86Subtarget &Subtarget) {


mayFoldIntoVector to match above?

phoebewang

LGTM.

llvm-ci · 2025-12-10T13:55:33Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux-fast running on sanitizer-buildbot4 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/169/builds/17882

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 94458 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: LLVM :: CodeGen/X86/basic-block-sections-clusters-bb-hash.ll (70398 of 94458)
******************** TEST 'LLVM :: CodeGen/X86/basic-block-sections-clusters-bb-hash.ll' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 11
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -filetype=obj -basic-block-address-map -emit-bb-hash -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -filetype=obj -basic-block-address-map -emit-bb-hash -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o
# note: command had no output on stdout or stderr
# RUN: at line 16
echo 'v1' > /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo v1
# note: command had no output on stdout or stderr
# RUN: at line 17
echo 'f foo' >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo 'f foo'
# note: command had no output on stdout or stderr
# RUN: at line 18
echo 'g 0:100,1:100,2:0 1:100,3:100 2:0,3:0 3:100' >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo 'g 0:100,1:100,2:0 1:100,3:100 2:0,3:0 3:100'
# note: command had no output on stdout or stderr
# RUN: at line 22
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-readobj /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o --bb-addr-map |  awk 'BEGIN {printf "h"}      /ID: [0-9]+/ {id=$2}      /Hash: 0x[0-9A-Fa-f]+/ {gsub(/^0x/, "", $2); hash=$2; printf " %s:%s", id, hash}      END {print ""}'  >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-readobj /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o --bb-addr-map
# note: command had no output on stdout or stderr
# executed command: awk 'BEGIN {printf "h"}      /ID: [0-9]+/ {id=$2}      /Hash: 0x[0-9A-Fa-f]+/ {gsub(/^0x/, "", $2); hash=$2; printf " %s:%s", id, hash}      END {print ""}'
# note: command had no output on stdout or stderr
# RUN: at line 29
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1 -basic-block-section-match-infer |  /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -check-prefixes=CHECK,LINUX-SECTIONS1
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1 -basic-block-section-match-infer
# note: command had no output on stdout or stderr
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -check-prefixes=CHECK,LINUX-SECTIONS1
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll:80:26: error: LINUX-SECTIONS1-LABEL: expected string not found in input
# | ; LINUX-SECTIONS1-LABEL: # %bb.1:
# |                          ^
# | <stdin>:6:5: note: scanning from here
# | foo: # @foo
Step 14 (stage2/msan check) failure: stage2/msan check (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using lld-link: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld-link
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using ld64.lld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/ld64.lld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/llvm/config.py:564: note: using wasm-ld: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/wasm-ld
llvm-lit: /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/lit/lit/main.py:74: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 94458 tests, 64 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.
FAIL: LLVM :: CodeGen/X86/basic-block-sections-clusters-bb-hash.ll (70398 of 94458)
******************** TEST 'LLVM :: CodeGen/X86/basic-block-sections-clusters-bb-hash.ll' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 11
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -filetype=obj -basic-block-address-map -emit-bb-hash -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -filetype=obj -basic-block-address-map -emit-bb-hash -o /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o
# note: command had no output on stdout or stderr
# RUN: at line 16
echo 'v1' > /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo v1
# note: command had no output on stdout or stderr
# RUN: at line 17
echo 'f foo' >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo 'f foo'
# note: command had no output on stdout or stderr
# RUN: at line 18
echo 'g 0:100,1:100,2:0 1:100,3:100 2:0,3:0 3:100' >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: echo 'g 0:100,1:100,2:0 1:100,3:100 2:0,3:0 3:100'
# note: command had no output on stdout or stderr
# RUN: at line 22
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-readobj /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o --bb-addr-map |  awk 'BEGIN {printf "h"}      /ID: [0-9]+/ {id=$2}      /Hash: 0x[0-9A-Fa-f]+/ {gsub(/^0x/, "", $2); hash=$2; printf " %s:%s", id, hash}      END {print ""}'  >> /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-readobj /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp.o --bb-addr-map
# note: command had no output on stdout or stderr
# executed command: awk 'BEGIN {printf "h"}      /ID: [0-9]+/ {id=$2}      /Hash: 0x[0-9A-Fa-f]+/ {gsub(/^0x/, "", $2); hash=$2; printf " %s:%s", id, hash}      END {print ""}'
# note: command had no output on stdout or stderr
# RUN: at line 29
/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc < /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1 -basic-block-section-match-infer |  /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -check-prefixes=CHECK,LINUX-SECTIONS1
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -mtriple=x86_64-pc-linux -function-sections -basic-block-sections=/home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/test/CodeGen/X86/Output/basic-block-sections-clusters-bb-hash.ll.tmp1 -basic-block-section-match-infer
# note: command had no output on stdout or stderr
# executed command: /home/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll -check-prefixes=CHECK,LINUX-SECTIONS1
# .---command stderr------------
# | /home/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/CodeGen/X86/basic-block-sections-clusters-bb-hash.ll:80:26: error: LINUX-SECTIONS1-LABEL: expected string not found in input
# | ; LINUX-SECTIONS1-LABEL: # %bb.1:
# |                          ^
# | <stdin>:6:5: note: scanning from here
# | foo: # @foo

…171616) If the scalar integer sources are freely transferable to the FPU, then perform the bitlogic op as a SSE/AVX operation. Uses the mayFoldIntoVector helper added at #171589

RKSimon requested a review from phoebewang December 10, 2025 10:12

llvmbot added the backend:X86 label Dec 10, 2025

phoebewang reviewed Dec 10, 2025

View reviewed changes

Rename mayFoldToVector -> mayFoldIntoVector

ba15ee4

RKSimon requested a review from phoebewang December 10, 2025 10:48

phoebewang approved these changes Dec 10, 2025

View reviewed changes

Merge branch 'main' into x86-cntbits-from-vector

21e5269

RKSimon enabled auto-merge (squash) December 10, 2025 11:50

RKSimon merged commit 11e457c into llvm:main Dec 10, 2025
9 of 10 checks passed

RKSimon deleted the x86-cntbits-from-vector branch December 10, 2025 12:26

RKSimon mentioned this pull request Dec 10, 2025

[X86] Allow handling of i128/256/512 AND/OR/XOR bitlogic on the FPU #171616

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit #171589

[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit #171589

Uh oh!

RKSimon commented Dec 10, 2025

Uh oh!

llvmbot commented Dec 10, 2025

Uh oh!

phoebewang Dec 10, 2025

Uh oh!

phoebewang left a comment

Uh oh!

Uh oh!

llvm-ci commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit #171589

[X86] Use vectorized i256 bit counts when we know the source originated from the vector unit #171589

Uh oh!

Conversation

RKSimon commented Dec 10, 2025

Uh oh!

llvmbot commented Dec 10, 2025

Uh oh!

phoebewang Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants