-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ConstantRange] Estimate tighter lower (upper) bounds for masked binary and (or) #120352
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang @llvm/pr-subscribers-llvm-transforms Author: Stephen Senran Zhang (zsrkmyn) ChangesTries to fix #118108. Full diff: https://github.com/llvm/llvm-project/pull/120352.diff 2 Files Affected:
diff --git a/llvm/lib/IR/ConstantRange.cpp b/llvm/lib/IR/ConstantRange.cpp
index d81a292916fdea..a36c4666d4762b 100644
--- a/llvm/lib/IR/ConstantRange.cpp
+++ b/llvm/lib/IR/ConstantRange.cpp
@@ -1520,15 +1520,102 @@ ConstantRange ConstantRange::binaryNot() const {
return ConstantRange(APInt::getAllOnes(getBitWidth())).sub(*this);
}
+/// Estimate the 'bit-masked AND' operation's lower bound.
+///
+/// E.g., given two ranges as follows (single quotes are separators and
+/// have no meaning here),
+///
+/// LHS = [10'001'010, ; LLo
+/// 10'100'000] ; LHi
+/// RHS = [10'111'010, ; RLo
+/// 10'111'100] ; RHi
+///
+/// we know that the higher 2 bits of the result is always '10'; and note that
+/// there's at least one bit is 1 in LHS[3:6] (since the range is continuous),
+/// and all bits in RHS[3:6] are 1, so we know the lower bound of the result is
+/// 10'001'000.
+///
+/// The algorithm is as follows,
+/// 1. we first calculate a mask to mask out the higher common bits by
+/// Mask = (LLo ^ LHi) | (LLo ^ LHi) | (LLo ^ RLo);
+/// Mask = set all non-leading-zero bits to 1 for Mask;
+/// 2. find the bit field with at least 1 in LHS (i.e., bit 3:6 in the example)
+/// after applying the mask, with
+/// StartBit = BitWidth - (LLo & Mask).clz() - 1;
+/// EndBit = BitWidth - (LHi & Mask).clz();
+/// 3. check if all bits in [StartBit:EndBit] in RHS are 1, and all bits of
+/// RLo and RHi in [StartBit:BitWidth] are same, and if so, the lower bound
+/// can be updated to
+/// LowerBound = LLo & Keep;
+/// where Keep is a mask to mask out trailing bits (the lower 3 bits in the
+/// example);
+/// 4. repeat the step 2 and 3 with LHS and RHS swapped, and update the lower
+/// bound with the smaller one.
+static APInt estimateBitMaskedAndLowerBound(const ConstantRange &LHS,
+ const ConstantRange &RHS) {
+ auto BitWidth = LHS.getBitWidth();
+ // If either is full set or unsigned wrapped, then the range must contain '0'
+ // which leads the lower bound to 0.
+ if ((LHS.isFullSet() || RHS.isFullSet()) ||
+ (LHS.isWrappedSet() || RHS.isWrappedSet()))
+ return APInt::getZero(BitWidth);
+
+ auto LLo = LHS.getLower();
+ auto LHi = LHS.getUpper() - 1;
+ auto RLo = RHS.getLower();
+ auto RHi = RHS.getUpper() - 1;
+
+ // Calculate the mask that mask out the higher common bits.
+ auto Mask = (LLo ^ LHi) | (RLo ^ RHi) | (LLo ^ RLo);
+ unsigned LeadingZeros = Mask.countLeadingZeros();
+ Mask.setLowBits(BitWidth - LeadingZeros);
+
+ auto estimateBound =
+ [BitWidth, &Mask](const APInt &ALo, const APInt &AHi, const APInt &BLo,
+ const APInt &BHi) -> std::optional<APInt> {
+ unsigned LeadingZeros = (ALo & Mask).countLeadingZeros();
+ if (LeadingZeros == BitWidth)
+ return std::nullopt;
+
+ unsigned StartBit = BitWidth - LeadingZeros - 1;
+
+ if (BLo.extractBits(BitWidth - StartBit, StartBit) !=
+ BHi.extractBits(BitWidth - StartBit, StartBit))
+ return std::nullopt;
+
+ unsigned EndBit = BitWidth - (AHi & Mask).countLeadingZeros();
+ if (!(BLo.extractBits(EndBit - StartBit, StartBit) &
+ BHi.extractBits(EndBit - StartBit, StartBit))
+ .isAllOnes())
+ return std::nullopt;
+
+ APInt Keep(BitWidth, 0);
+ Keep.setBits(StartBit, BitWidth);
+ return Keep & ALo;
+ };
+
+ auto LowerBoundByLHS = estimateBound(LLo, LHi, RLo, RHi);
+ auto LowerBoundByRHS = estimateBound(RLo, RHi, LLo, LHi);
+
+ if (LowerBoundByLHS && LowerBoundByRHS)
+ return LowerBoundByLHS->ult(*LowerBoundByRHS) ? *LowerBoundByLHS
+ : *LowerBoundByRHS;
+ if (LowerBoundByLHS)
+ return *LowerBoundByLHS;
+ if (LowerBoundByRHS)
+ return *LowerBoundByRHS;
+ return APInt::getZero(BitWidth);
+}
+
ConstantRange ConstantRange::binaryAnd(const ConstantRange &Other) const {
if (isEmptySet() || Other.isEmptySet())
return getEmpty();
ConstantRange KnownBitsRange =
fromKnownBits(toKnownBits() & Other.toKnownBits(), false);
- ConstantRange UMinUMaxRange =
- getNonEmpty(APInt::getZero(getBitWidth()),
- APIntOps::umin(Other.getUnsignedMax(), getUnsignedMax()) + 1);
+ auto LowerBound = estimateBitMaskedAndLowerBound(*this, Other);
+ ConstantRange UMinUMaxRange = getNonEmpty(
+ LowerBound, APIntOps::umin(Other.getUnsignedMax(), getUnsignedMax()) + 1);
return KnownBitsRange.intersectWith(UMinUMaxRange);
}
@@ -1538,10 +1625,17 @@ ConstantRange ConstantRange::binaryOr(const ConstantRange &Other) const {
ConstantRange KnownBitsRange =
fromKnownBits(toKnownBits() | Other.toKnownBits(), false);
+
+ // ~a & ~b >= x
+ // <=> ~(~a & ~b) <= ~x
+ // <=> a | b <= ~x
+ // <=> a | b < ~x + 1
+ // thus, UpperBound(a | b) == ~LowerBound(~a & ~b) + 1
+ auto UpperBound =
+ ~estimateBitMaskedAndLowerBound(binaryNot(), Other.binaryNot()) + 1;
// Upper wrapped range.
- ConstantRange UMaxUMinRange =
- getNonEmpty(APIntOps::umax(getUnsignedMin(), Other.getUnsignedMin()),
- APInt::getZero(getBitWidth()));
+ ConstantRange UMaxUMinRange = getNonEmpty(
+ APIntOps::umax(getUnsignedMin(), Other.getUnsignedMin()), UpperBound);
return KnownBitsRange.intersectWith(UMaxUMinRange);
}
diff --git a/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll b/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll
new file mode 100644
index 00000000000000..e81c5d739c6d29
--- /dev/null
+++ b/llvm/test/Transforms/SCCP/range-and-or-bit-masked.ll
@@ -0,0 +1,88 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes=ipsccp %s | FileCheck %s
+
+declare void @use(i1)
+
+define i1 @test1(i64 %x) {
+; CHECK-LABEL: @test1(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[COND:%.*]] = icmp ugt i64 [[X:%.*]], 65535
+; CHECK-NEXT: call void @llvm.assume(i1 [[COND]])
+; CHECK-NEXT: [[MASK:%.*]] = and i64 [[X]], -65521
+; CHECK-NEXT: ret i1 false
+;
+entry:
+ %cond = icmp ugt i64 %x, 65535
+ call void @llvm.assume(i1 %cond)
+ %mask = and i64 %x, -65521
+ %cmp = icmp eq i64 %mask, 0
+ ret i1 %cmp
+}
+
+define void @test.and(i64 %x, i64 %y) {
+; CHECK-LABEL: @test.and(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[C0:%.*]] = icmp uge i64 [[X:%.*]], 138
+; CHECK-NEXT: [[C1:%.*]] = icmp ule i64 [[X]], 161
+; CHECK-NEXT: call void @llvm.assume(i1 [[C0]])
+; CHECK-NEXT: call void @llvm.assume(i1 [[C1]])
+; CHECK-NEXT: [[C2:%.*]] = icmp uge i64 [[Y:%.*]], 186
+; CHECK-NEXT: [[C3:%.*]] = icmp ule i64 [[Y]], 188
+; CHECK-NEXT: call void @llvm.assume(i1 [[C2]])
+; CHECK-NEXT: call void @llvm.assume(i1 [[C3]])
+; CHECK-NEXT: [[AND:%.*]] = and i64 [[X]], [[Y]]
+; CHECK-NEXT: call void @use(i1 false)
+; CHECK-NEXT: [[R1:%.*]] = icmp ult i64 [[AND]], 137
+; CHECK-NEXT: call void @use(i1 [[R1]])
+; CHECK-NEXT: ret void
+;
+entry:
+ %c0 = icmp uge i64 %x, 138 ; 0b10001010
+ %c1 = icmp ule i64 %x, 161 ; 0b10100000
+ call void @llvm.assume(i1 %c0)
+ call void @llvm.assume(i1 %c1)
+ %c2 = icmp uge i64 %y, 186 ; 0b10111010
+ %c3 = icmp ule i64 %y, 188 ; 0b10111110
+ call void @llvm.assume(i1 %c2)
+ call void @llvm.assume(i1 %c3)
+ %and = and i64 %x, %y
+ %r0 = icmp ult i64 %and, 136 ; 0b10001000
+ call void @use(i1 %r0) ; false
+ %r1 = icmp ult i64 %and, 137
+ call void @use(i1 %r1) ; unknown
+ ret void
+}
+
+define void @test.or(i64 %x, i64 %y) {
+; CHECK-LABEL: @test.or(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[C0:%.*]] = icmp ule i64 [[X:%.*]], 117
+; CHECK-NEXT: [[C1:%.*]] = icmp uge i64 [[X]], 95
+; CHECK-NEXT: call void @llvm.assume(i1 [[C0]])
+; CHECK-NEXT: call void @llvm.assume(i1 [[C1]])
+; CHECK-NEXT: [[C2:%.*]] = icmp ule i64 [[Y:%.*]], 69
+; CHECK-NEXT: [[C3:%.*]] = icmp uge i64 [[Y]], 67
+; CHECK-NEXT: call void @llvm.assume(i1 [[C2]])
+; CHECK-NEXT: call void @llvm.assume(i1 [[C3]])
+; CHECK-NEXT: [[OR:%.*]] = or i64 [[X]], [[Y]]
+; CHECK-NEXT: call void @use(i1 false)
+; CHECK-NEXT: [[R1:%.*]] = icmp ugt i64 [[OR]], 118
+; CHECK-NEXT: call void @use(i1 [[R1]])
+; CHECK-NEXT: ret void
+;
+entry:
+ %c0 = icmp ule i64 %x, 117 ; 0b01110101
+ %c1 = icmp uge i64 %x, 95 ; 0b01011111
+ call void @llvm.assume(i1 %c0)
+ call void @llvm.assume(i1 %c1)
+ %c2 = icmp ule i64 %y, 69 ; 0b01000101
+ %c3 = icmp uge i64 %y, 67 ; 0b01000011
+ call void @llvm.assume(i1 %c2)
+ call void @llvm.assume(i1 %c3)
+ %or = or i64 %x, %y
+ %r0 = icmp ugt i64 %or, 119 ; 0b01110111
+ call void @use(i1 %r0) ; false
+ %r1 = icmp ugt i64 %or, 118
+ call void @use(i1 %r1) ; unknown
+ ret void
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have expected the simplest case to be solved suboptimally with a few range intersections in ValueTracking, but this seems like a nice extension, if that's the appropriate direction and compile-time acceptable.
@antoniofrighetto Aha, initially I want to fix it in ValueTracking as well, but I just found ValueTracking cannot model the case that "there's at least one bit being 1 in some field of a variable, but any bit in the field can be 0", which is the case in #118108. And only ConstantRange is able to do so. That's why I fix it here. As for the performance, almost all operations of the algorithm are bit manipulation. It should be fast for native int types. But for APInt, sadly, I have no idea. APInt always checks 'isSingleWord()' and branches :-/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed Tests
Clang.CodeGen/AArch64/fpm-helpers.c
Is it related with this patch?
Thx! I thought it wasn't as it's 'Clang.CodeGen'. Fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give me more time to understand the implementation...
This patch increases the compile time by ~0.02%. It should be acceptable. |
llvm/lib/IR/ConstantRange.cpp
Outdated
if (LeadingZeros == BitWidth) | ||
return std::nullopt; | ||
|
||
unsigned StartBit = BitWidth - LeadingZeros - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this algorithm to make the result optimal for all non-wrapped ranges?
An example: [7, 14) & [-1, 0)
should produce [7, 14)
. But currently it gives [4, 14)
.
You can apply the following patch to get more sub-optimal cases :)
diff --git a/llvm/unittests/IR/ConstantRangeTest.cpp b/llvm/unittests/IR/ConstantRangeTest.cpp
index e1d9b3e387b2..1a616269e3b7 100644
--- a/llvm/unittests/IR/ConstantRangeTest.cpp
+++ b/llvm/unittests/IR/ConstantRangeTest.cpp
@@ -2725,7 +2725,7 @@ TEST_F(ConstantRangeTest, binaryAnd) {
return CR1.binaryAnd(CR2);
},
[](const APInt &N1, const APInt &N2) { return N1 & N2; }, PreferSmallest,
- CheckSingleElementsOnly);
+ CheckNonWrappedOnly);
}
TEST_F(ConstantRangeTest, binaryOr) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! Actually we can do so by slightly extend the StartBit
to lower bits if extended bits in B are all ones.
E.g., given
CR_A: [01101110,
01110001]
CR_B: {01111110}
the lower bound is 01101000
by the current algorithm (StartBit
is 3, counting from 0), but we notice that the bits 2 and 1 are also 1 in range B, so we can extend the StartBit
to 1. And the lower bound becomes 01101110
.
This can be done by some easy change. I'ma refine and update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole algorithm is actually extremely simple. I made things complicated... You'll see how stupid I am after I update the PR later. Many thanks for the review!
…ry and (or) Co-author: Yingwei Zheng (@dtcxzyw)
@dtcxzyw, I've updated the patch. It's quite simple. You can forget about the previous patch and start a review from scratch. unittests weren't updated with your patch, since the upper bound wasn't optimal (for AND op). I'm thinking if I need to add a new test mode to test the optimal lower (upper) bound only for AND (OR). |
If you cannot make it optimal for all non-wrapped cases, please just add some special cases (e.g., |
Ah, I mean, the lower bound by this patch is optimal, but the upper isn't, so tests failed. :-D But adding special cases should work. |
Tries to fix #118108.