[X86] PromoteMaskArithmetic - oneuse vector_extend_inreg nodes can be safely promoted on AVX2+ targets #157425

RKSimon · 2025-09-08T11:01:46Z

Allows us to extend a result back to the largest type after we've handled mask logic using vXi1 result types from different source vector widths (e.g. v8i32 and v8i8)

Fixes #157382

…safely promoted on AVX2+ targets Allows to extend the result back to the largest type after we've handled mask logic using vXi1 result types from different source vector widths (e.g. v8i32 and v8i8) Fixes llvm#157411

llvmbot · 2025-09-08T11:02:22Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

Allows us to extend a result back to the largest type after we've handled mask logic using vXi1 result types from different source vector widths (e.g. v8i32 and v8i8)

Fixes #157411

Full diff: https://github.com/llvm/llvm-project/pull/157425.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+15-7)
(modified) llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll (+2-5)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index cdc97faf394ca..af624e381be5a 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -50936,10 +50936,12 @@ static SDValue combineAndShuffleNot(SDNode *N, SelectionDAG &DAG,
 // Given a target type \p VT, we generate
 //   or (and x, y), (xor z, zext(build_vector (constants)))
 // given x, y and z are of type \p VT. We can do so, if operands are either
-// truncates from VT types, the second operand is a vector of constants or can
-// be recursively promoted.
+// truncates from VT types, the second operand is a vector of constants, can
+// be recursively promoted or is an existing extension we can extend further.
 static SDValue PromoteMaskArithmetic(SDValue N, const SDLoc &DL, EVT VT,
-                                     SelectionDAG &DAG, unsigned Depth) {
+                                     SelectionDAG &DAG,
+                                     const X86Subtarget &Subtarget,
+                                     unsigned Depth) {
   // Limit recursion to avoid excessive compile times.
   if (Depth >= SelectionDAG::MaxRecursionDepth)
     return SDValue();
@@ -50954,7 +50956,8 @@ static SDValue PromoteMaskArithmetic(SDValue N, const SDLoc &DL, EVT VT,
   if (!TLI.isOperationLegalOrPromote(N.getOpcode(), VT))
     return SDValue();
 
-  if (SDValue NN0 = PromoteMaskArithmetic(N0, DL, VT, DAG, Depth + 1))
+  if (SDValue NN0 =
+          PromoteMaskArithmetic(N0, DL, VT, DAG, Subtarget, Depth + 1))
     N0 = NN0;
   else {
     // The left side has to be a trunc.
@@ -50968,14 +50971,19 @@ static SDValue PromoteMaskArithmetic(SDValue N, const SDLoc &DL, EVT VT,
     N0 = N0.getOperand(0);
   }
 
-  if (SDValue NN1 = PromoteMaskArithmetic(N1, DL, VT, DAG, Depth + 1))
+  if (SDValue NN1 =
+          PromoteMaskArithmetic(N1, DL, VT, DAG, Subtarget, Depth + 1))
     N1 = NN1;
   else {
-    // The right side has to be a 'trunc' or a (foldable) constant.
+    // The right side has to be a 'trunc', a (foldable) constant or an
+    // existing extension we can extend further.
     bool RHSTrunc = N1.getOpcode() == ISD::TRUNCATE &&
                     N1.getOperand(0).getValueType() == VT;
     if (RHSTrunc)
       N1 = N1.getOperand(0);
+    else if (ISD::isExtVecInRegOpcode(N1.getOpcode()) && VT.is256BitVector() &&
+             Subtarget.hasInt256() && N1.hasOneUse())
+      N1 = DAG.getNode(N1.getOpcode(), DL, VT, N1.getOperand(0));
     else if (SDValue Cst =
                  DAG.FoldConstantArithmetic(ISD::ZERO_EXTEND, DL, VT, {N1}))
       N1 = Cst;
@@ -51005,7 +51013,7 @@ static SDValue PromoteMaskArithmetic(SDValue N, const SDLoc &DL,
   EVT NarrowVT = Narrow.getValueType();
 
   // Generate the wide operation.
-  SDValue Op = PromoteMaskArithmetic(Narrow, DL, VT, DAG, 0);
+  SDValue Op = PromoteMaskArithmetic(Narrow, DL, VT, DAG, Subtarget, 0);
   if (!Op)
     return SDValue();
   switch (N.getOpcode()) {
diff --git a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
index 9e1686e19ce1b..474be4465d9b7 100644
--- a/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
+++ b/llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
@@ -733,11 +733,8 @@ define <8 x i32> @PR157382(ptr %p0, ptr %p1, ptr %p2) {
 ; AVX2-NEXT:    vpcmpeqb %xmm3, %xmm2, %xmm2
 ; AVX2-NEXT:    vpcmpeqd %xmm3, %xmm3, %xmm3
 ; AVX2-NEXT:    vpxor %xmm3, %xmm2, %xmm2
-; AVX2-NEXT:    vpmovsxbw %xmm2, %xmm2
-; AVX2-NEXT:    vextracti128 $1, %ymm1, %xmm3
-; AVX2-NEXT:    vpackssdw %xmm3, %xmm1, %xmm1
-; AVX2-NEXT:    vpor %xmm2, %xmm1, %xmm1
-; AVX2-NEXT:    vpmovsxwd %xmm1, %ymm1
+; AVX2-NEXT:    vpmovsxbd %xmm2, %ymm2
+; AVX2-NEXT:    vpor %ymm2, %ymm1, %ymm1
 ; AVX2-NEXT:    vpand %ymm0, %ymm1, %ymm0
 ; AVX2-NEXT:    retq
 ;

…ern. NFC. Make it more obvious that LHS/RHS truncation patterns are the same. Noticed while working on llvm#157425

…ern. NFC. (#157426) Make it more obvious that LHS/RHS truncation patterns are the same. Noticed while working on #157425

phoebewang · 2025-09-08T12:19:33Z

Fixes #157411

Do you mean #157382?

RKSimon · 2025-09-08T12:32:55Z

Fixes #157411

Do you mean #157382?

Yes, sorry - cut+pasta :(

phoebewang

LGTM.

RKSimon requested a review from phoebewang September 8, 2025 11:01

llvmbot added the backend:X86 label Sep 8, 2025

RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Sep 8, 2025

[X86] PromoteMaskArithmetic - use similar code for LHS/RHS trunc patt…

0654e5e

…ern. NFC. Make it more obvious that LHS/RHS truncation patterns are the same. Noticed while working on llvm#157425

RKSimon mentioned this pull request Sep 8, 2025

[X86] PromoteMaskArithmetic - use similar code for LHS/RHS trunc pattern. NFC. #157426

Merged

RKSimon added a commit that referenced this pull request Sep 8, 2025

[X86] PromoteMaskArithmetic - use similar code for LHS/RHS trunc patt…

f20640b

…ern. NFC. (#157426) Make it more obvious that LHS/RHS truncation patterns are the same. Noticed while working on #157425

Merge branch 'main' into x86-promote-mask-inreg-ext

c72e1c8

phoebewang approved these changes Sep 8, 2025

View reviewed changes

Merge branch 'main' into x86-promote-mask-inreg-ext

4b98f47

RKSimon enabled auto-merge (squash) September 8, 2025 12:41

Merge branch 'main' into x86-promote-mask-inreg-ext

3c97349

RKSimon merged commit ed33690 into llvm:main Sep 8, 2025
9 checks passed

RKSimon deleted the x86-promote-mask-inreg-ext branch September 8, 2025 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] PromoteMaskArithmetic - oneuse vector_extend_inreg nodes can be safely promoted on AVX2+ targets #157425

[X86] PromoteMaskArithmetic - oneuse vector_extend_inreg nodes can be safely promoted on AVX2+ targets #157425

Uh oh!

RKSimon commented Sep 8, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 8, 2025

Uh oh!

phoebewang commented Sep 8, 2025

Uh oh!

RKSimon commented Sep 8, 2025

Uh oh!

phoebewang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[X86] PromoteMaskArithmetic - oneuse vector_extend_inreg nodes can be safely promoted on AVX2+ targets #157425

[X86] PromoteMaskArithmetic - oneuse vector_extend_inreg nodes can be safely promoted on AVX2+ targets #157425

Uh oh!

Conversation

RKSimon commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 8, 2025

Uh oh!

phoebewang commented Sep 8, 2025

Uh oh!

RKSimon commented Sep 8, 2025

Uh oh!

phoebewang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RKSimon commented Sep 8, 2025 •

edited

Loading