-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF #67697
Conversation
llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
Outdated
Show resolved
Hide resolved
commit 9c2faf15231ac5ebc168161d1731feed55eb177c Merge: 0a0ac8da5df6 baecc9e Author: Rin <irina.dobrescu@arm.com> Date: Thu Oct 5 11:19:13 2023 +0100 Merge branch 'main' into maxTC_tailBase commit 0a0ac8da5df684b865d0fb16f7a806832f37e05b Author: Rin Dobrescu <rin.dobrescu@arm.com> Date: Thu Sep 28 15:48:49 2023 +0000 [AArch64][LoopVectorize] Use upper bound trip count instead of the constant TC when choosing max VF commit 26e009c Author: Rin Dobrescu <rin.dobrescu@arm.com> Date: Thu Sep 28 10:30:39 2023 +0000 Remove 'assertions automatically generated' line from test commit e056129 Author: Rin Dobrescu <rin.dobrescu@arm.com> Date: Wed Sep 27 14:47:42 2023 +0000 Address comments and fix tests commit 1bf78c8 Author: Rin Dobrescu <rin.dobrescu@arm.com> Date: Mon Sep 25 11:34:15 2023 +0000 [AArch64][LoopVectorize] Use either fixed-width or scalable VF when tail-folding
@llvm/pr-subscribers-llvm-transforms ChangesThis patch is based off of #67543. It should not be merged before the previous PR. We are currently using the exact trip count to make decisions regarding the maximum VF. We can instead use the upper bound TC, which will be the same as the constant trip count when that is known. Full diff: https://github.com/llvm/llvm-project/pull/67697.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 53ad37bf3599b5c..26bf92d7d7c02be 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1663,17 +1663,17 @@ class LoopVectorizationCostModel {
/// disabled or unsupported, then the scalable part will be equal to
/// ElementCount::getScalable(0).
FixedScalableVFPair computeFeasibleMaxVF(unsigned ConstTripCount,
+ unsigned MaxTripCount,
ElementCount UserVF,
bool FoldTailByMasking);
/// \return the maximized element count based on the targets vector
/// registers and the loop trip-count, but limited to a maximum safe VF.
/// This is a helper function of computeFeasibleMaxVF.
- ElementCount getMaximizedVFForTarget(unsigned ConstTripCount,
- unsigned SmallestType,
- unsigned WidestType,
- ElementCount MaxSafeVF,
- bool FoldTailByMasking);
+ ElementCount
+ getMaximizedVFForTarget(unsigned ConstTripCount, unsigned MaxTripCount,
+ unsigned SmallestType, unsigned WidestType,
+ ElementCount MaxSafeVF, bool FoldTailByMasking);
/// \return the maximum legal scalable VF, based on the safe max number
/// of elements.
@@ -4811,7 +4811,8 @@ LoopVectorizationCostModel::getMaxLegalScalableVF(unsigned MaxSafeElements) {
}
FixedScalableVFPair LoopVectorizationCostModel::computeFeasibleMaxVF(
- unsigned ConstTripCount, ElementCount UserVF, bool FoldTailByMasking) {
+ unsigned ConstTripCount, unsigned MaxTripCount, ElementCount UserVF,
+ bool FoldTailByMasking) {
MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
@@ -4898,14 +4899,14 @@ FixedScalableVFPair LoopVectorizationCostModel::computeFeasibleMaxVF(
FixedScalableVFPair Result(ElementCount::getFixed(1),
ElementCount::getScalable(0));
- if (auto MaxVF =
- getMaximizedVFForTarget(ConstTripCount, SmallestType, WidestType,
- MaxSafeFixedVF, FoldTailByMasking))
+ if (auto MaxVF = getMaximizedVFForTarget(ConstTripCount, MaxTripCount,
+ SmallestType, WidestType,
+ MaxSafeFixedVF, FoldTailByMasking))
Result.FixedVF = MaxVF;
- if (auto MaxVF =
- getMaximizedVFForTarget(ConstTripCount, SmallestType, WidestType,
- MaxSafeScalableVF, FoldTailByMasking))
+ if (auto MaxVF = getMaximizedVFForTarget(
+ ConstTripCount, MaxTripCount, SmallestType, WidestType,
+ MaxSafeScalableVF, FoldTailByMasking))
if (MaxVF.isScalable()) {
Result.ScalableVF = MaxVF;
LLVM_DEBUG(dbgs() << "LV: Found feasible scalable VF = " << MaxVF
@@ -4928,6 +4929,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
}
unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
+ unsigned MaxTC = PSE.getSE()->getSmallConstantMaxTripCount(TheLoop);
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",
@@ -4938,7 +4940,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:
- return computeFeasibleMaxVF(TC, UserVF, false);
+ return computeFeasibleMaxVF(TC, MaxTC, UserVF, false);
case CM_ScalarEpilogueNotAllowedUsePredicate:
[[fallthrough]];
case CM_ScalarEpilogueNotNeededUsePredicate:
@@ -4976,7 +4978,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
- return computeFeasibleMaxVF(TC, UserVF, false);
+ return computeFeasibleMaxVF(TC, MaxTC, UserVF, false);
}
return FixedScalableVFPair::getNone();
}
@@ -4993,7 +4995,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}
- FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(TC, UserVF, true);
+ FixedScalableVFPair MaxFactors =
+ computeFeasibleMaxVF(TC, MaxTC, UserVF, true);
// Avoid tail folding if the trip count is known to be a multiple of any VF
// we choose.
@@ -5069,8 +5072,8 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
}
ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
- unsigned ConstTripCount, unsigned SmallestType, unsigned WidestType,
- ElementCount MaxSafeVF, bool FoldTailByMasking) {
+ unsigned ConstTripCount, unsigned MaxTripCount, unsigned SmallestType,
+ unsigned WidestType, ElementCount MaxSafeVF, bool FoldTailByMasking) {
bool ComputeScalableMaxVF = MaxSafeVF.isScalable();
const TypeSize WidestRegister = TTI.getRegisterBitWidth(
ComputeScalableMaxVF ? TargetTransformInfo::RGK_ScalableVector
@@ -5108,24 +5111,24 @@ ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
}
// When a scalar epilogue is required, at least one iteration of the scalar
- // loop has to execute. Adjust ConstTripCount accordingly to avoid picking a
+ // loop has to execute. Adjust MaxTripCount accordingly to avoid picking a
// max VF that results in a dead vector loop.
- if (ConstTripCount > 0 && requiresScalarEpilogue(true))
- ConstTripCount -= 1;
-
- if (ConstTripCount && ConstTripCount <= WidestRegisterMinEC &&
- (!FoldTailByMasking || isPowerOf2_32(ConstTripCount))) {
- // If loop trip count (TC) is known at compile time there is no point in
- // choosing VF greater than TC (as done in the loop below). Select maximum
- // power of two which doesn't exceed TC.
- // If MaxVectorElementCount is scalable, we only fall back on a fixed VF
- // when the TC is less than or equal to the known number of lanes.
- auto ClampedConstTripCount = llvm::bit_floor(ConstTripCount);
+ if (MaxTripCount > 0 && requiresScalarEpilogue(true))
+ MaxTripCount -= 1;
+
+ if (MaxTripCount && MaxTripCount <= WidestRegisterMinEC &&
+ (!FoldTailByMasking || isPowerOf2_32(MaxTripCount))) {
+ // If upper bound loop trip count (TC) is known at compile time there is no
+ // point in choosing VF greater than TC (as done in the loop below). Select
+ // maximum power of two which doesn't exceed TC. If MaxVectorElementCount is
+ // scalable, we only fall back on a fixed VF when the TC is less than or
+ // equal to the known number of lanes.
+ auto ClampedUpperTripCount = llvm::bit_floor(MaxTripCount);
LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to maximum power of two not "
"exceeding the constant trip count: "
- << ClampedConstTripCount << "\n");
+ << ClampedUpperTripCount << "\n");
return ElementCount::get(
- ClampedConstTripCount,
+ ClampedUpperTripCount,
FoldTailByMasking ? MaxVectorElementCount.isScalable() : false);
}
|
…xVF functions and add test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @rin-arm! It looks like a nice improvement that makes greater use of information that the compiler already has. I just had a couple of minor comments regarding the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Could you rename the new test before merging the patch? Thanks!
This patch is based off of #67543. It should not be merged before the previous PR.
We are currently using the exact trip count to make decisions regarding the maximum VF. We can instead use the upper bound TC, which will be the same as the constant trip count when that is known.