[InstCombine] Prevent constant operand shrinkage on returns of zero-extension ANDs #146201

ericxu233 · 2025-06-28T06:53:37Z

This patch updates the InstCombineSimplifyDemanded logic to avoid shrinking the RHS constant of an and instruction when the instruction has only one use and that use is a return. In such cases, if the constant is a bitmask of a simple zero-extension (e.g., 0xFF, 0xFFFF, etc.), shrinking it could prevent the backend from recognizing and optimizing it into efficient zero-extension code (e.g., using movzx or register size semantics on x86). This is particularly important when paired with llvm.assume intrinsic and code involving some control flow, which the backend may not be able to deduce enough information for optimization.

The two tests in known-bits.ll are additional examples of why this change is useful.

Before the change:
optimized IR to x86 assembly with LLC
We can clearly see that there are redundant AND bitmasks in the x86 assembly.

After this change:
optimized IR to x86 assembly with LLC
No redundant ANDs anymore.

This is my first time contributing to open-source LLVM. Please let me know if I am doing anything wrong/non-standard here. This is my best effort to address this issue in the middle-end. We can probably implement the same optimization in the backend if we had enough information there, but it would essentially be undoing an optimization.

…xtension ANDs

github-actions · 2025-06-28T06:53:53Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-06-28T06:54:21Z

@llvm/pr-subscribers-llvm-transforms

Author: Hanyang (Eric) Xu (ericxu233)

Changes

Fixes: #143778

This patch updates the InstCombineSimplifyDemanded logic to avoid shrinking the RHS constant of an and instruction when the instruction has only one use and that use is a return. In such cases, if the constant is a bitmask of a simple zero-extension (e.g., 0xFF, 0xFFFF, etc.), shrinking it could prevent the backend from recognizing and optimizing it into efficient zero-extension code (e.g., using movzx or register size semantics on x86). This is particularly important when paired with llvm.assume intrinsic and code involving some control flow, which the backend may not be able to deduce enough information for optimization.

The two tests in known-bits.ll are additional examples of why this change is useful.

Before the change:
optimized IR to x86 assembly with LLC
We can clearly see that there are redundant AND bitmasks in the x86 assembly.

After this change:
optimized IR to x86 assembly with LLC
No redundant ANDs anymore.

This is my first time contributing to open-source LLVM. Please let me know if I am doing anything wrong/non-standard here. This is my best effort to address this issue in the middle-end. We can probably implement the same optimization in the backend if we had enough information there, but it would essentially be undoing an optimization.

Full diff: https://github.com/llvm/llvm-project/pull/146201.diff

5 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp (+19-1)
(modified) llvm/test/Transforms/InstCombine/bswap-fold.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/known-bits.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/trunc-shl-zext.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/vscale.ll (+1-1)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
index 0e3436d12702d..362d03f912b3b 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
@@ -234,8 +234,26 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I,
     if (DemandedMask.isSubsetOf(RHSKnown.Zero | LHSKnown.One))
       return I->getOperand(1);
 
+    // If the 'and' has only one use and that use is a return instruction,
+    // and the constant mask is a simple low-bit zero-extension (like 0xFF,
+    // 0xFFFF, etc.), then skip constant RHS shrinkage. This is because the
+    // backend is capable of optimizing this pattern into a single instruction
+    // zero-extension during codegen anyways. For example, by utilizing the eax
+    // register in x86. Performing the constant operand shrinkage transformation
+    // here may block the backend's optimization especially with assumes which
+    // the backend is unable to take advantage of.
+    bool IsReturnOnlyUse = I->hasOneUse() && isa<ReturnInst>(I->user_back());
+
+    const APInt *C;
+    bool IsZextBitmask = match(I->getOperand(1), m_APInt(C)) &&
+                         ((C->getBitWidth() >= 8 && C->isMask(8)) ||
+                          (C->getBitWidth() >= 16 && C->isMask(16)) ||
+                          (C->getBitWidth() >= 32 && C->isMask(32)) ||
+                          (C->getBitWidth() >= 64 && C->isMask(64)));
+
     // If the RHS is a constant, see if we can simplify it.
-    if (ShrinkDemandedConstant(I, 1, DemandedMask & ~LHSKnown.Zero))
+    if (!(IsReturnOnlyUse && IsZextBitmask) &&
+        ShrinkDemandedConstant(I, 1, DemandedMask & ~LHSKnown.Zero))
       return I;
 
     break;
diff --git a/llvm/test/Transforms/InstCombine/bswap-fold.ll b/llvm/test/Transforms/InstCombine/bswap-fold.ll
index f7268ec9df090..9a4144b834fe9 100644
--- a/llvm/test/Transforms/InstCombine/bswap-fold.ll
+++ b/llvm/test/Transforms/InstCombine/bswap-fold.ll
@@ -838,7 +838,7 @@ define i32 @bs_active_high7(i32 %0) {
 define <2 x i64> @bs_active_high4(<2 x i64> %0) {
 ; CHECK-LABEL: @bs_active_high4(
 ; CHECK-NEXT:    [[TMP2:%.*]] = shl <2 x i64> [[TMP0:%.*]], splat (i64 4)
-; CHECK-NEXT:    [[TMP3:%.*]] = and <2 x i64> [[TMP2]], splat (i64 240)
+; CHECK-NEXT:    [[TMP3:%.*]] = and <2 x i64> [[TMP2]], splat (i64 255)
 ; CHECK-NEXT:    ret <2 x i64> [[TMP3]]
 ;
   %2 = shl <2 x i64> %0, <i64 60, i64 60>
diff --git a/llvm/test/Transforms/InstCombine/known-bits.ll b/llvm/test/Transforms/InstCombine/known-bits.ll
index 9a9fec694ff0e..1e653f3201f39 100644
--- a/llvm/test/Transforms/InstCombine/known-bits.ll
+++ b/llvm/test/Transforms/InstCombine/known-bits.ll
@@ -384,7 +384,7 @@ define i64 @test_icmp_trunc2(i64 %x) {
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i32 [[CONV]], 12
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
-; CHECK-NEXT:    [[SEXT:%.*]] = and i64 [[X]], 2147483647
+; CHECK-NEXT:    [[SEXT:%.*]] = and i64 [[X]], 4294967295
 ; CHECK-NEXT:    ret i64 [[SEXT]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    ret i64 0
@@ -408,7 +408,7 @@ define i64 @test_icmp_trunc3(i64 %n) {
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[CONV]], 96
 ; CHECK-NEXT:    br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_ELSE:%.*]]
 ; CHECK:       if.then:
-; CHECK-NEXT:    [[RET:%.*]] = and i64 [[N]], 127
+; CHECK-NEXT:    [[RET:%.*]] = and i64 [[N]], 4294967295
 ; CHECK-NEXT:    ret i64 [[RET]]
 ; CHECK:       if.else:
 ; CHECK-NEXT:    ret i64 0
diff --git a/llvm/test/Transforms/InstCombine/trunc-shl-zext.ll b/llvm/test/Transforms/InstCombine/trunc-shl-zext.ll
index 576125b86de8f..03d3ea7f1f5fa 100644
--- a/llvm/test/Transforms/InstCombine/trunc-shl-zext.ll
+++ b/llvm/test/Transforms/InstCombine/trunc-shl-zext.ll
@@ -7,7 +7,7 @@ define i32 @trunc_shl_zext_32(i32 %a) {
 ; CHECK-LABEL: define i32 @trunc_shl_zext_32
 ; CHECK-SAME: (i32 [[A:%.*]]) {
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[A]], 4
-; CHECK-NEXT:    [[EXT:%.*]] = and i32 [[SHL]], 65520
+; CHECK-NEXT:    [[EXT:%.*]] = and i32 [[SHL]], 65535
 ; CHECK-NEXT:    ret i32 [[EXT]]
 ;
   %trunc = trunc i32 %a to i16
@@ -20,7 +20,7 @@ define i64 @trunc_shl_zext_64(i64 %a) {
 ; CHECK-LABEL: define i64 @trunc_shl_zext_64
 ; CHECK-SAME: (i64 [[A:%.*]]) {
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i64 [[A]], 7
-; CHECK-NEXT:    [[EXT:%.*]] = and i64 [[SHL]], 128
+; CHECK-NEXT:    [[EXT:%.*]] = and i64 [[SHL]], 255
 ; CHECK-NEXT:    ret i64 [[EXT]]
 ;
   %trunc = trunc i64 %a to i8
diff --git a/llvm/test/Transforms/InstCombine/vscale.ll b/llvm/test/Transforms/InstCombine/vscale.ll
index dbb5ca4bae9be..550af4c4d6ccd 100644
--- a/llvm/test/Transforms/InstCombine/vscale.ll
+++ b/llvm/test/Transforms/InstCombine/vscale.ll
@@ -17,7 +17,7 @@ define i64 @pomote_zext_shl_vscale_i32_to_i64() {
 ; CHECK-LABEL: @pomote_zext_shl_vscale_i32_to_i64(
 ; CHECK-NEXT:    [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[SHL:%.*]] = shl i64 [[VSCALE]], 3
-; CHECK-NEXT:    [[EXT:%.*]] = and i64 [[SHL]], 4294967288
+; CHECK-NEXT:    [[EXT:%.*]] = and i64 [[SHL]], 4294967295
 ; CHECK-NEXT:    ret i64 [[EXT]]
 ;
   %vscale = call i32 @llvm.vscale.i32()

nikic

I don't think that the "return only use" here is really relevant -- it's a general property of x86 that it can encode a "low 32-bit" mask more compactly than other masks, regardless of where in the code it occurs.

Now, if you remove that special case and expose this to wider test coverage, you'll probably quickly find that this is going to break other optimizations that rely on the current demanded bits reduction.

The fact that the demanded constant shrinking can cause the introduction of unfavorable immediates is a known problem, and the case being addressed here is one of the less problematic ones. The most problematic are probably the cases where you have a small negative immediate like -8 and then zero out the top bits, creating a huge positive immediate instead.

This is challenging to undo in the backend if the information comes from assume (currently not persisted during codegen) or control flow (not really available at the SDAG level).

I don't have any particularly good idea on how to fix this case.

ericxu233 · 2025-06-29T05:49:07Z

I don't think that the "return only use" here is really relevant -- it's a general property of x86 that it can encode a "low 32-bit" mask more compactly than other masks, regardless of where in the code it occurs.

Now, if you remove that special case and expose this to wider test coverage, you'll probably quickly find that this is going to break other optimizations that rely on the current demanded bits reduction.

I got rid of the return-only use, and only five additional test cases fail with ninja -C build check-llvm:

LLVM :: Transforms/InstCombine/callsite_nonnull_args_through_casts.ll
LLVM :: Transforms/InstCombine/icmp-trunc.ll
LLVM :: Transforms/InstCombine/shift-shift.ll
LLVM :: Transforms/InstCombine/shift.ll
LLVM :: Transforms/InstCombine/sub-gep.ll

I haven't looked into them in more detail yet, but do you think that looking into them and fixing them would be worth it so we have a more general coverage?

ericxu233 · 2025-06-29T05:51:40Z

The fact that the demanded constant shrinking can cause the introduction of unfavorable immediates is a known problem, and the case being addressed here is one of the less problematic ones. The most problematic are probably the cases where you have a small negative immediate like -8 and then zero out the top bits, creating a huge positive immediate instead.

This is challenging to undo in the backend if the information comes from assume (currently not persisted during codegen) or control flow (not really available at the SDAG level).

I don't have any particularly good idea on how to fix this case.

Ah, I see. May I suggest that we add a helper function called bool GuardShrinkConstant() so that we can add cases where we want to prevent constant operand shrinkage from happening? This change could be one of the first ones to go into it.

[InstCombine] Prevent constant operand shrinkage on returns of zero-e…

3d0a2c5

…xtension ANDs

ericxu233 requested a review from nikic as a code owner June 28, 2025 06:53

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jun 28, 2025

nikic reviewed Jun 28, 2025

View reviewed changes

nikic requested a review from dtcxzyw June 28, 2025 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Prevent constant operand shrinkage on returns of zero-extension ANDs #146201

[InstCombine] Prevent constant operand shrinkage on returns of zero-extension ANDs #146201

ericxu233 commented Jun 28, 2025

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

llvmbot commented Jun 28, 2025

Uh oh!

nikic left a comment

Uh oh!

ericxu233 commented Jun 29, 2025

Uh oh!

ericxu233 commented Jun 29, 2025

Uh oh!

Uh oh!

[InstCombine] Prevent constant operand shrinkage on returns of zero-extension ANDs #146201

Are you sure you want to change the base?

[InstCombine] Prevent constant operand shrinkage on returns of zero-extension ANDs #146201

Conversation

ericxu233 commented Jun 28, 2025

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

llvmbot commented Jun 28, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

ericxu233 commented Jun 29, 2025

Uh oh!

ericxu233 commented Jun 29, 2025

Uh oh!

Uh oh!