[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352

zsrkmyn · 2024-12-18T03:37:07Z

Tries to fix #118108.

llvmbot · 2024-12-18T03:37:54Z

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Stephen Senran Zhang (zsrkmyn)

Changes

Tries to fix #118108.

Full diff: https://github.com/llvm/llvm-project/pull/120352.diff

2 Files Affected:

(modified) llvm/lib/IR/ConstantRange.cpp (+100-6)
(added) llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll (+88)

diff --git a/llvm/lib/IR/ConstantRange.cpp b/llvm/lib/IR/ConstantRange.cpp
index d81a292916fdea..a36c4666d4762b 100644
--- a/llvm/lib/IR/ConstantRange.cpp
+++ b/llvm/lib/IR/ConstantRange.cpp
@@ -1520,15 +1520,102 @@ ConstantRange ConstantRange::binaryNot() const {
   return ConstantRange(APInt::getAllOnes(getBitWidth())).sub(*this);
 }
 
+/// Estimate the 'bit-masked AND' operation's lower bound.
+///
+/// E.g., given two ranges as follows (single quotes are separators and
+/// have no meaning here),
+///
+///   LHS = [10'001'010,  ; LLo
+///          10'100'000]  ; LHi
+///   RHS = [10'111'010,  ; RLo
+///          10'111'100]  ; RHi
+///
+/// we know that the higher 2 bits of the result is always '10'; and note that
+/// there's at least one bit is 1 in LHS[3:6] (since the range is continuous),
+/// and all bits in RHS[3:6] are 1, so we know the lower bound of the result is
+/// 10'001'000.
+///
+/// The algorithm is as follows,
+/// 1. we first calculate a mask to mask out the higher common bits by
+///       Mask = (LLo ^ LHi) | (LLo ^ LHi) | (LLo ^ RLo);
+///       Mask = set all non-leading-zero bits to 1 for Mask;
+/// 2. find the bit field with at least 1 in LHS (i.e., bit 3:6 in the example)
+///    after applying the mask, with
+///       StartBit = BitWidth - (LLo & Mask).clz() - 1;
+///       EndBit = BitWidth - (LHi & Mask).clz();
+/// 3. check if all bits in [StartBit:EndBit] in RHS are 1, and all bits of
+///    RLo and RHi in [StartBit:BitWidth] are same, and if so, the lower bound
+///    can be updated to
+///       LowerBound = LLo & Keep;
+///    where Keep is a mask to mask out trailing bits (the lower 3 bits in the
+///    example);
+/// 4. repeat the step 2 and 3 with LHS and RHS swapped, and update the lower
+///    bound with the smaller one.
+static APInt estimateBitMaskedAndLowerBound(const ConstantRange &LHS,
+                                            const ConstantRange &RHS) {
+  auto BitWidth = LHS.getBitWidth();
+  // If either is full set or unsigned wrapped, then the range must contain '0'
+  // which leads the lower bound to 0.
+  if ((LHS.isFullSet() || RHS.isFullSet()) ||
+      (LHS.isWrappedSet() || RHS.isWrappedSet()))
+    return APInt::getZero(BitWidth);
+
+  auto LLo = LHS.getLower();
+  auto LHi = LHS.getUpper() - 1;
+  auto RLo = RHS.getLower();
+  auto RHi = RHS.getUpper() - 1;
+
+  // Calculate the mask that mask out the higher common bits.
+  auto Mask = (LLo ^ LHi) | (RLo ^ RHi) | (LLo ^ RLo);
+  unsigned LeadingZeros = Mask.countLeadingZeros();
+  Mask.setLowBits(BitWidth - LeadingZeros);
+
+  auto estimateBound =
+      [BitWidth, &Mask](const APInt &ALo, const APInt &AHi, const APInt &BLo,
+                        const APInt &BHi) -> std::optional<APInt> {
+    unsigned LeadingZeros = (ALo & Mask).countLeadingZeros();
+    if (LeadingZeros == BitWidth)
+      return std::nullopt;
+
+    unsigned StartBit = BitWidth - LeadingZeros - 1;
+
+    if (BLo.extractBits(BitWidth - StartBit, StartBit) !=
+        BHi.extractBits(BitWidth - StartBit, StartBit))
+      return std::nullopt;
+
+    unsigned EndBit = BitWidth - (AHi & Mask).countLeadingZeros();
+    if (!(BLo.extractBits(EndBit - StartBit, StartBit) &
+          BHi.extractBits(EndBit - StartBit, StartBit))
+             .isAllOnes())
+      return std::nullopt;
+
+    APInt Keep(BitWidth, 0);
+    Keep.setBits(StartBit, BitWidth);
+    return Keep & ALo;
+  };
+
+  auto LowerBoundByLHS = estimateBound(LLo, LHi, RLo, RHi);
+  auto LowerBoundByRHS = estimateBound(RLo, RHi, LLo, LHi);
+
+  if (LowerBoundByLHS && LowerBoundByRHS)
+    return LowerBoundByLHS->ult(*LowerBoundByRHS) ? *LowerBoundByLHS
+                                                  : *LowerBoundByRHS;
+  if (LowerBoundByLHS)
+    return *LowerBoundByLHS;
+  if (LowerBoundByRHS)
+    return *LowerBoundByRHS;
+  return APInt::getZero(BitWidth);
+}
+
 ConstantRange ConstantRange::binaryAnd(const ConstantRange &Other) const {
   if (isEmptySet() || Other.isEmptySet())
     return getEmpty();
 
   ConstantRange KnownBitsRange =
       fromKnownBits(toKnownBits() & Other.toKnownBits(), false);
-  ConstantRange UMinUMaxRange =
-      getNonEmpty(APInt::getZero(getBitWidth()),
-                  APIntOps::umin(Other.getUnsignedMax(), getUnsignedMax()) + 1);
+  auto LowerBound = estimateBitMaskedAndLowerBound(*this, Other);
+  ConstantRange UMinUMaxRange = getNonEmpty(
+      LowerBound, APIntOps::umin(Other.getUnsignedMax(), getUnsignedMax()) + 1);
   return KnownBitsRange.intersectWith(UMinUMaxRange);
 }
 
@@ -1538,10 +1625,17 @@ ConstantRange ConstantRange::binaryOr(const ConstantRange &Other) const {
 
   ConstantRange KnownBitsRange =
       fromKnownBits(toKnownBits() | Other.toKnownBits(), false);
+
+  //      ~a & ~b    >= x
+  // <=>  ~(~a & ~b) <= ~x
+  // <=>  a | b      <= ~x
+  // <=>  a | b      <  ~x + 1
+  // thus, UpperBound(a | b) == ~LowerBound(~a & ~b) + 1
+  auto UpperBound =
+      ~estimateBitMaskedAndLowerBound(binaryNot(), Other.binaryNot()) + 1;
   // Upper wrapped range.
-  ConstantRange UMaxUMinRange =
-      getNonEmpty(APIntOps::umax(getUnsignedMin(), Other.getUnsignedMin()),
-                  APInt::getZero(getBitWidth()));
+  ConstantRange UMaxUMinRange = getNonEmpty(
+      APIntOps::umax(getUnsignedMin(), Other.getUnsignedMin()), UpperBound);
   return KnownBitsRange.intersectWith(UMaxUMinRange);
 }
 
diff --git a/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll b/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll
new file mode 100644
index 00000000000000..e81c5d739c6d29
--- /dev/null
+++ b/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes=ipsccp %s | FileCheck %s
+
+declare void @use(i1)
+
+define i1 @test1(i64 %x) {
+; CHECK-LABEL: @test1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[COND:%.*]] = icmp ugt i64 [[X:%.*]], 65535
+; CHECK-NEXT:    call void @llvm.assume(i1 [[COND]])
+; CHECK-NEXT:    [[MASK:%.*]] = and i64 [[X]], -65521
+; CHECK-NEXT:    ret i1 false
+;
+entry:
+  %cond = icmp ugt i64 %x, 65535
+  call void @llvm.assume(i1 %cond)
+  %mask = and i64 %x, -65521
+  %cmp = icmp eq i64 %mask, 0
+  ret i1 %cmp
+}
+
+define void @test.and(i64 %x, i64 %y) {
+; CHECK-LABEL: @test.and(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[C0:%.*]] = icmp uge i64 [[X:%.*]], 138
+; CHECK-NEXT:    [[C1:%.*]] = icmp ule i64 [[X]], 161
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C0]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C1]])
+; CHECK-NEXT:    [[C2:%.*]] = icmp uge i64 [[Y:%.*]], 186
+; CHECK-NEXT:    [[C3:%.*]] = icmp ule i64 [[Y]], 188
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C2]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C3]])
+; CHECK-NEXT:    [[AND:%.*]] = and i64 [[X]], [[Y]]
+; CHECK-NEXT:    call void @use(i1 false)
+; CHECK-NEXT:    [[R1:%.*]] = icmp ult i64 [[AND]], 137
+; CHECK-NEXT:    call void @use(i1 [[R1]])
+; CHECK-NEXT:    ret void
+;
+entry:
+  %c0 = icmp uge i64 %x, 138 ; 0b10001010
+  %c1 = icmp ule i64 %x, 161 ; 0b10100000
+  call void @llvm.assume(i1 %c0)
+  call void @llvm.assume(i1 %c1)
+  %c2 = icmp uge i64 %y, 186 ; 0b10111010
+  %c3 = icmp ule i64 %y, 188 ; 0b10111110
+  call void @llvm.assume(i1 %c2)
+  call void @llvm.assume(i1 %c3)
+  %and = and i64 %x, %y
+  %r0 = icmp ult i64 %and, 136 ; 0b10001000
+  call void @use(i1 %r0) ; false
+  %r1 = icmp ult i64 %and, 137
+  call void @use(i1 %r1) ; unknown
+  ret void
+}
+
+define void @test.or(i64 %x, i64 %y) {
+; CHECK-LABEL: @test.or(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[C0:%.*]] = icmp ule i64 [[X:%.*]], 117
+; CHECK-NEXT:    [[C1:%.*]] = icmp uge i64 [[X]], 95
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C0]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C1]])
+; CHECK-NEXT:    [[C2:%.*]] = icmp ule i64 [[Y:%.*]], 69
+; CHECK-NEXT:    [[C3:%.*]] = icmp uge i64 [[Y]], 67
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C2]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[C3]])
+; CHECK-NEXT:    [[OR:%.*]] = or i64 [[X]], [[Y]]
+; CHECK-NEXT:    call void @use(i1 false)
+; CHECK-NEXT:    [[R1:%.*]] = icmp ugt i64 [[OR]], 118
+; CHECK-NEXT:    call void @use(i1 [[R1]])
+; CHECK-NEXT:    ret void
+;
+entry:
+  %c0 = icmp ule i64 %x, 117 ; 0b01110101
+  %c1 = icmp uge i64 %x, 95  ; 0b01011111
+  call void @llvm.assume(i1 %c0)
+  call void @llvm.assume(i1 %c1)
+  %c2 = icmp ule i64 %y, 69  ; 0b01000101
+  %c3 = icmp uge i64 %y, 67  ; 0b01000011
+  call void @llvm.assume(i1 %c2)
+  call void @llvm.assume(i1 %c3)
+  %or = or i64 %x, %y
+  %r0 = icmp ugt i64 %or, 119 ; 0b01110111
+  call void @use(i1 %r0) ; false
+  %r1 = icmp ugt i64 %or, 118
+  call void @use(i1 %r1) ; unknown
+  ret void
+}

antoniofrighetto

I'd have expected the simplest case to be solved suboptimally with a few range intersections in ValueTracking, but this seems like a nice extension, if that's the appropriate direction and compile-time acceptable.

zsrkmyn · 2024-12-18T10:02:44Z

@antoniofrighetto Aha, initially I want to fix it in ValueTracking as well, but I just found ValueTracking cannot model the case that "there's at least one bit being 1 in some field of a variable, but any bit in the field can be 0", which is the case in #118108. And only ConstantRange is able to do so. That's why I fix it here.

As for the performance, almost all operations of the algorithm are bit manipulation. It should be fast for native int types. But for APInt, sadly, I have no idea. APInt always checks 'isSingleWord()' and branches :-/

dtcxzyw

Failed Tests
Clang.CodeGen/AArch64/fpm-helpers.c

Is it related with this patch?

llvm/lib/IR/ConstantRange.cpp

zsrkmyn · 2024-12-19T04:55:45Z

Failed Tests
Clang.CodeGen/AArch64/fpm-helpers.c

Is it related with this patch?

Thx! I thought it wasn't as it's 'Clang.CodeGen'. Fixed now.

dtcxzyw

Please give me more time to understand the implementation...

llvm/lib/IR/ConstantRange.cpp

dtcxzyw · 2024-12-19T11:12:10Z

I'd have expected the simplest case to be solved suboptimally with a few range intersections in ValueTracking, but this seems like a nice extension, if that's the appropriate direction and compile-time acceptable.

This patch increases the compile time by ~0.02%. It should be acceptable.

dtcxzyw · 2024-12-20T12:43:37Z

llvm/lib/IR/ConstantRange.cpp

+    if (LeadingZeros == BitWidth)
+      return std::nullopt;
+
+    unsigned StartBit = BitWidth - LeadingZeros - 1;


Can we extend this algorithm to make the result optimal for all non-wrapped ranges?
An example: [7, 14) & [-1, 0) should produce [7, 14). But currently it gives [4, 14).

You can apply the following patch to get more sub-optimal cases :)

diff --git a/llvm/unittests/IR/ConstantRangeTest.cpp b/llvm/unittests/IR/ConstantRangeTest.cpp index e1d9b3e387b2..1a616269e3b7 100644 --- a/llvm/unittests/IR/ConstantRangeTest.cpp +++ b/llvm/unittests/IR/ConstantRangeTest.cpp @@ -2725,7 +2725,7 @@ TEST_F(ConstantRangeTest, binaryAnd) { return CR1.binaryAnd(CR2); }, [](const APInt &N1, const APInt &N2) { return N1 & N2; }, PreferSmallest, - CheckSingleElementsOnly); + CheckNonWrappedOnly); } TEST_F(ConstantRangeTest, binaryOr) {

Great point! Actually we can do so by slightly extend the StartBit to lower bits if extended bits in B are all ones.

E.g., given

CR_A: [01101110, 01110001] CR_B: {01111110}

the lower bound is 01101000 by the current algorithm (StartBit is 3, counting from 0), but we notice that the bits 2 and 1 are also 1 in range B, so we can extend the StartBit to 1. And the lower bound becomes 01101110.

This can be done by some easy change. I'ma refine and update the PR.

The whole algorithm is actually extremely simple. I made things complicated... You'll see how stupid I am after I update the PR later. Many thanks for the review!

@dtcxzyw

…ry and (or) Co-author: Yingwei Zheng (@dtcxzyw)

zsrkmyn · 2024-12-23T08:58:08Z

@dtcxzyw, I've updated the patch. It's quite simple. You can forget about the previous patch and start a review from scratch.

unittests weren't updated with your patch, since the upper bound wasn't optimal (for AND op). I'm thinking if I need to add a new test mode to test the optimal lower (upper) bound only for AND (OR).

dtcxzyw · 2024-12-23T09:01:23Z

I'm thinking if I need to add a new test mode to test the optimal lower (upper) bound only for AND (OR).

If you cannot make it optimal for all non-wrapped cases, please just add some special cases (e.g., [7, 14) & [-1, 0) = [7, 14)) before TestBinaryOpExhaustive.

zsrkmyn · 2024-12-24T03:42:54Z

I'm thinking if I need to add a new test mode to test the optimal lower (upper) bound only for AND (OR).

If you cannot make it optimal for all non-wrapped cases, please just add some special cases (e.g., [7, 14) & [-1, 0) = [7, 14)) before TestBinaryOpExhaustive.

Ah, I mean, the lower bound by this patch is optimal, but the upper isn't, so tests failed. :-D But adding special cases should work.

zsrkmyn requested a review from nikic as a code owner December 18, 2024 03:37

llvmbot added llvm:ir llvm:transforms labels Dec 18, 2024

zsrkmyn requested review from dtcxzyw and MaskRay December 18, 2024 03:37

dtcxzyw requested a review from antoniofrighetto December 18, 2024 07:45

This was referenced Dec 18, 2024

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

pre-commit: PR120352 dtcxzyw/llvm-opt-benchmark#1881

Closed

antoniofrighetto reviewed Dec 18, 2024

View reviewed changes

dtcxzyw reviewed Dec 18, 2024

View reviewed changes

llvm/lib/IR/ConstantRange.cpp Show resolved Hide resolved

zsrkmyn force-pushed the lb-and branch from 4973f02 to 90f7539 Compare December 19, 2024 04:53

llvmbot added the clang Clang issues not falling into any other category label Dec 19, 2024

dtcxzyw reviewed Dec 19, 2024

View reviewed changes

llvm/lib/IR/ConstantRange.cpp Show resolved Hide resolved

llvm/lib/IR/ConstantRange.cpp Outdated Show resolved Hide resolved

llvm/lib/IR/ConstantRange.cpp Outdated Show resolved Hide resolved

llvm/lib/IR/ConstantRange.cpp Show resolved Hide resolved

dtcxzyw mentioned this pull request Dec 19, 2024

pre-commit: PR120352 dtcxzyw/llvm-opt-benchmark#1885

Closed

zsrkmyn force-pushed the lb-and branch from 90f7539 to 3351cf8 Compare December 19, 2024 09:52

dtcxzyw reviewed Dec 20, 2024

View reviewed changes

[ConstantRange] Estimate tighter lower (upper) bounds for masked bina…

2cb7f07

…ry and (or) Co-author: Yingwei Zheng (@dtcxzyw)

zsrkmyn force-pushed the lb-and branch from 3351cf8 to 2cb7f07 Compare December 23, 2024 08:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352

[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352

zsrkmyn commented Dec 18, 2024

llvmbot commented Dec 18, 2024 •

edited

Loading

antoniofrighetto left a comment

zsrkmyn commented Dec 18, 2024

dtcxzyw left a comment

zsrkmyn commented Dec 19, 2024

dtcxzyw left a comment

dtcxzyw commented Dec 19, 2024

dtcxzyw Dec 20, 2024

zsrkmyn Dec 23, 2024

zsrkmyn Dec 23, 2024

zsrkmyn commented Dec 23, 2024 •

edited

Loading

dtcxzyw commented Dec 23, 2024

zsrkmyn commented Dec 24, 2024

[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352

Are you sure you want to change the base?

[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352

Conversation

zsrkmyn commented Dec 18, 2024

llvmbot commented Dec 18, 2024 • edited Loading

antoniofrighetto left a comment

Choose a reason for hiding this comment

zsrkmyn commented Dec 18, 2024

dtcxzyw left a comment

Choose a reason for hiding this comment

zsrkmyn commented Dec 19, 2024

dtcxzyw left a comment

Choose a reason for hiding this comment

dtcxzyw commented Dec 19, 2024

dtcxzyw Dec 20, 2024

Choose a reason for hiding this comment

zsrkmyn Dec 23, 2024

Choose a reason for hiding this comment

zsrkmyn Dec 23, 2024

Choose a reason for hiding this comment

zsrkmyn commented Dec 23, 2024 • edited Loading

dtcxzyw commented Dec 23, 2024

zsrkmyn commented Dec 24, 2024

llvmbot commented Dec 18, 2024 •

edited

Loading

zsrkmyn commented Dec 23, 2024 •

edited

Loading