-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] First step towards VPlan cost modeling. #67934
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-transforms ChangesThis adds a new computeCost interface to VPReicpeBase and implements it It also adds getBestPlan function to LVP which computes the cost of all The VPlan selected by the VPlan cost model is executed and there is an Builds on VPlan type inference (included in this PR as separate commit). Patch is 26.93 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67934.diff 7 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 998dfd956575d3c..9674094024b9ec7 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -6,6 +6,7 @@ add_llvm_component_library(LLVMVectorize
Vectorize.cpp
VectorCombine.cpp
VPlan.cpp
+ VPlanAnalysis.cpp
VPlanHCFGBuilder.cpp
VPlanRecipes.cpp
VPlanSLP.cpp
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
index 9691e1cd4f2ed00..08142fa014c178d 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
@@ -316,6 +316,8 @@ class LoopVectorizationPlanner {
/// A builder used to construct the current plan.
VPBuilder Builder;
+ InstructionCost computeCost(VPlan &Plan, ElementCount VF);
+
public:
LoopVectorizationPlanner(Loop *L, LoopInfo *LI, const TargetLibraryInfo *TLI,
const TargetTransformInfo &TTI,
@@ -339,6 +341,8 @@ class LoopVectorizationPlanner {
/// Return the best VPlan for \p VF.
VPlan &getBestPlanFor(ElementCount VF) const;
+ std::pair<VPlan &, ElementCount> getBestPlan();
+
/// Generate the IR code for the body of the vectorized loop according to the
/// best selected \p VF, \p UF and VPlan \p BestPlan.
/// TODO: \p IsEpilogueVectorization is needed to avoid issues due to epilogue
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cc17d91d4f43727..b34d11e516ebbc3 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1679,21 +1679,11 @@ class LoopVectorizationCostModel {
/// of elements.
ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements);
- /// Returns the execution time cost of an instruction for a given vector
- /// width. Vector width of one means scalar.
- VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);
-
/// The cost-computation logic from getInstructionCost which provides
/// the vector type as an output parameter.
InstructionCost getInstructionCost(Instruction *I, ElementCount VF,
Type *&VectorTy);
- /// Return the cost of instructions in an inloop reduction pattern, if I is
- /// part of that pattern.
- std::optional<InstructionCost>
- getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy,
- TTI::TargetCostKind CostKind);
-
/// Calculate vectorization cost of memory instruction \p I.
InstructionCost getMemoryInstructionCost(Instruction *I, ElementCount VF);
@@ -1839,6 +1829,15 @@ class LoopVectorizationCostModel {
}
public:
+ /// Returns the execution time cost of an instruction for a given vector
+ /// width. Vector width of one means scalar.
+ VectorizationCostTy getInstructionCost(Instruction *I, ElementCount VF);
+ /// Return the cost of instructions in an inloop reduction pattern, if I is
+ /// part of that pattern.
+ std::optional<InstructionCost>
+ getReductionPatternCost(Instruction *I, ElementCount VF, Type *VectorTy,
+ TTI::TargetCostKind CostKind);
+
/// The loop that we evaluate.
Loop *TheLoop;
@@ -5369,7 +5368,7 @@ VectorizationFactor LoopVectorizationPlanner::selectVectorizationFactor(
? Candidate.Width.getKnownMinValue() * AssumedMinimumVscale
: Candidate.Width.getFixedValue();
LLVM_DEBUG(dbgs() << "LV: Vector loop of width " << i
- << " costs: " << (Candidate.Cost / Width));
+ << " costs: " << Candidate.Cost / Width);
if (i.isScalable())
LLVM_DEBUG(dbgs() << " (assuming a minimum vscale of "
<< AssumedMinimumVscale << ")");
@@ -7529,6 +7528,108 @@ LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
return VF;
}
+InstructionCost LoopVectorizationPlanner::computeCost(VPlan &Plan,
+ ElementCount VF) {
+ InstructionCost Cost = 0;
+
+ VPBasicBlock *Header =
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getEntry());
+
+ // Cost modeling for inductions is inaccurate in the legacy cost model. Try as
+ // to match it here initially during VPlan cost model bring up:
+ // * VPWidenIntOrFpInductionRecipes implement computeCost,
+ // * VPWidenPointerInductionRecipe costs seem to be 0 in the legacy cost model
+ // * other inductions only have a cost of 1 (i.e. the cost of the scalar
+ // induction increment).
+ unsigned NumWideIVs = count_if(Header->phis(), [](VPRecipeBase &R) {
+ return isa<VPWidenPointerInductionRecipe>(&R) ||
+ (isa<VPWidenIntOrFpInductionRecipe>(&R) &&
+ !cast<VPWidenIntOrFpInductionRecipe>(&R)->getTruncInst());
+ });
+ Cost += Legal->getInductionVars().size() - NumWideIVs;
+
+ for (VPBlockBase *Block : to_vector(vp_depth_first_shallow(Header))) {
+ if (auto *Region = dyn_cast<VPRegionBlock>(Block)) {
+ assert(Region->isReplicator());
+ VPBasicBlock *Then =
+ cast<VPBasicBlock>(Region->getEntry()->getSuccessors()[0]);
+ for (VPRecipeBase &R : *Then) {
+ if (isa<VPInstruction, VPScalarIVStepsRecipe>(&R))
+ continue;
+ auto *RepR = cast<VPReplicateRecipe>(&R);
+ Cost += CM.getInstructionCost(RepR->getUnderlyingInstr(), VF).first;
+ }
+ continue;
+ }
+
+ VPCostContext Ctx(CM.TTI, OrigLoop->getHeader()->getContext());
+ for (VPRecipeBase &R : *cast<VPBasicBlock>(Block)) {
+ InstructionCost RecipeCost = R.computeCost(VF, Ctx);
+ if (!RecipeCost.isValid()) {
+ if (auto *IG = dyn_cast<VPInterleaveRecipe>(&R)) {
+ RecipeCost = CM.getInstructionCost(IG->getInsertPos(), VF).first;
+ } else if (auto *WidenMem =
+ dyn_cast<VPWidenMemoryInstructionRecipe>(&R)) {
+ RecipeCost =
+ CM.getInstructionCost(&WidenMem->getIngredient(), VF).first;
+ } else if (auto *I = dyn_cast_or_null<Instruction>(
+ R.getVPSingleValue()->getUnderlyingValue()))
+ RecipeCost = CM.getInstructionCost(I, VF).first;
+ else
+ continue;
+ }
+ if (ForceTargetInstructionCost.getNumOccurrences() > 0)
+ Cost = InstructionCost(ForceTargetInstructionCost);
+
+ LLVM_DEBUG({
+ dbgs() << "Cost of " << RecipeCost << " for " << VF << ": ";
+ R.dump();
+ });
+ Cost += RecipeCost;
+ }
+ }
+ Cost += 1;
+ LLVM_DEBUG(dbgs() << "Cost for " << VF << ": " << Cost << "\n");
+ return Cost;
+}
+
+std::pair<VPlan &, ElementCount> LoopVectorizationPlanner::getBestPlan() {
+ // If there is a single VPlan with a single VF, return it directly.
+ if (VPlans.size() == 1 && size(VPlans[0]->vectorFactors()) == 1) {
+ ElementCount VF = *VPlans[0]->vectorFactors().begin();
+ return {*VPlans[0], VF};
+ }
+
+ VPlan *BestPlan = &*VPlans[0];
+ assert(hasPlanWithVF(ElementCount::getFixed(1)));
+ ElementCount BestVF = ElementCount::getFixed(1);
+ InstructionCost ScalarCost = computeCost(
+ getBestPlanFor(ElementCount::getFixed(1)), ElementCount::getFixed(1));
+ InstructionCost BestCost = ScalarCost;
+ bool ForceVectorization = Hints.getForce() == LoopVectorizeHints::FK_Enabled;
+ if (ForceVectorization) {
+ // Ignore scalar width, because the user explicitly wants vectorization.
+ // Initialize cost to max so that VF = 2 is, at least, chosen during cost
+ // evaluation.
+ BestCost = InstructionCost::getMax();
+ }
+
+ for (auto &P : VPlans) {
+ for (ElementCount VF : P->vectorFactors()) {
+ if (VF.isScalar())
+ continue;
+ InstructionCost Cost = computeCost(*P, VF);
+ if (isMoreProfitable(VectorizationFactor(VF, Cost, ScalarCost),
+ VectorizationFactor(BestVF, BestCost, ScalarCost))) {
+ BestCost = Cost;
+ BestVF = VF;
+ BestPlan = &*P;
+ }
+ }
+ }
+ return {*BestPlan, BestVF};
+}
+
VPlan &LoopVectorizationPlanner::getBestPlanFor(ElementCount VF) const {
assert(count_if(VPlans,
[VF](const VPlanPtr &Plan) { return Plan->hasVF(VF); }) ==
@@ -8595,7 +8696,7 @@ VPRecipeBuilder::tryToCreateWidenRecipe(Instruction *Instr,
new VPWidenCastRecipe(CI->getOpcode(), Operands[0], CI->getType(), CI));
}
- return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan));
+ return toVPRecipeResult(tryToWiden(Instr, Operands, VPBB, Plan);
}
void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
@@ -10161,8 +10262,12 @@ bool LoopVectorizePass::processLoop(Loop *L) {
VF.MinProfitableTripCount, IC, &LVL, &CM, BFI,
PSI, Checks);
- VPlan &BestPlan = LVP.getBestPlanFor(VF.Width);
- LVP.executePlan(VF.Width, IC, BestPlan, LB, DT, false);
+ const auto &[BestPlan, Width] = LVP.getBestPlan();
+ LLVM_DEBUG(dbgs() << "VF picked by VPlan cost model: " << Width
+ << "\n");
+ assert(VF.Width == Width &&
+ "VPlan cost model and legacy cost model disagreed");
+ LVP.executePlan(Width, IC, BestPlan, LB, DT, false);
++LoopsVectorized;
// Add metadata to disable runtime unrolling a scalar loop when there
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index e65a7ab2cd028ee..02d93915e3c8d6e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -23,6 +23,7 @@
#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_H
+#include "VPlanAnalysis.h"
#include "VPlanValue.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
@@ -38,6 +39,7 @@
#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/FMF.h"
#include "llvm/IR/Operator.h"
+#include "llvm/Support/InstructionCost.h"
#include <algorithm>
#include <cassert>
#include <cstddef>
@@ -697,6 +699,14 @@ class VPLiveOut : public VPUser {
#endif
};
+struct VPCostContext {
+ const TargetTransformInfo &TTI;
+ VPTypeAnalysis Types;
+
+ VPCostContext(const TargetTransformInfo &TTI, LLVMContext &Ctx)
+ : TTI(TTI), Types(Ctx) {}
+};
+
/// VPRecipeBase is a base class modeling a sequence of one or more output IR
/// instructions. VPRecipeBase owns the VPValues it defines through VPDef
/// and is responsible for deleting its defined values. Single-value
@@ -762,6 +772,10 @@ class VPRecipeBase : public ilist_node_with_parent<VPRecipeBase, VPBasicBlock>,
/// \returns an iterator pointing to the element after the erased one
iplist<VPRecipeBase>::iterator eraseFromParent();
+ virtual InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) {
+ return InstructionCost::getInvalid();
+ }
+
/// Returns the underlying instruction, if the recipe is a VPValue or nullptr
/// otherwise.
Instruction *getUnderlyingInstr() {
@@ -1167,6 +1181,10 @@ class VPWidenRecipe : public VPRecipeWithIRFlags, public VPValue {
/// Produce widened copies of all Ingredients.
void execute(VPTransformState &State) override;
+ unsigned getOpcode() const { return Opcode; }
+
+ InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;
+
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
/// Print the recipe.
void print(raw_ostream &O, const Twine &Indent,
@@ -1458,9 +1476,11 @@ class VPWidenIntOrFpInductionRecipe : public VPHeaderPHIRecipe {
bool isCanonical() const;
/// Returns the scalar type of the induction.
- const Type *getScalarType() const {
+ Type *getScalarType() const {
return Trunc ? Trunc->getType() : IV->getType();
}
+
+ InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override;
};
class VPWidenPointerInductionRecipe : public VPHeaderPHIRecipe {
@@ -1747,6 +1767,8 @@ class VPInterleaveRecipe : public VPRecipeBase {
"Op must be an operand of the recipe");
return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op);
}
+
+ Instruction *getInsertPos() const { return IG->getInsertPos(); }
};
/// A recipe to represent inloop reduction operations, performing a reduction on
@@ -2080,7 +2102,7 @@ class VPCanonicalIVPHIRecipe : public VPHeaderPHIRecipe {
#endif
/// Returns the scalar type of the induction.
- const Type *getScalarType() const {
+ Type *getScalarType() const {
return getOperand(0)->getLiveInIRValue()->getType();
}
@@ -2149,7 +2171,7 @@ class VPWidenCanonicalIVRecipe : public VPRecipeBase, public VPValue {
#endif
/// Returns the scalar type of the induction.
- const Type *getScalarType() const {
+ Type *getScalarType() const {
return cast<VPCanonicalIVPHIRecipe>(getOperand(0)->getDefiningRecipe())
->getScalarType();
}
@@ -2596,6 +2618,10 @@ class VPlan {
bool hasVF(ElementCount VF) { return VFs.count(VF); }
+ iterator_range<SmallSetVector<ElementCount, 2>::iterator> vectorFactors() {
+ return {VFs.begin(), VFs.end()};
+ }
+
bool hasScalarVFOnly() const { return VFs.size() == 1 && VFs[0].isScalar(); }
bool hasUF(unsigned UF) const { return UFs.empty() || UFs.contains(UF); }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
new file mode 100644
index 000000000000000..088da81f950425c
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -0,0 +1,225 @@
+//===- VPlanAnalysis.cpp - Various Analyses working on VPlan ----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "VPlanAnalysis.h"
+#include "VPlan.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "vplan"
+
+Type *VPTypeAnalysis::inferType(const VPBlendRecipe *R) {
+ return inferType(R->getIncomingValue(0));
+}
+
+Type *VPTypeAnalysis::inferType(const VPInstruction *R) {
+ switch (R->getOpcode()) {
+ case Instruction::Select:
+ return inferType(R->getOperand(1));
+ case VPInstruction::FirstOrderRecurrenceSplice:
+ return inferType(R->getOperand(0));
+ default:
+ llvm_unreachable("Unhandled instruction!");
+ }
+}
+
+Type *VPTypeAnalysis::inferType(const VPInterleaveRecipe *R) { return nullptr; }
+
+Type *VPTypeAnalysis::inferType(const VPReductionPHIRecipe *R) {
+ return R->getOperand(0)->getLiveInIRValue()->getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenRecipe *R) {
+ unsigned Opcode = R->getOpcode();
+ switch (Opcode) {
+ case Instruction::ICmp:
+ case Instruction::FCmp:
+ return IntegerType::get(Ctx, 1);
+ case Instruction::UDiv:
+ case Instruction::SDiv:
+ case Instruction::SRem:
+ case Instruction::URem:
+ case Instruction::Add:
+ case Instruction::FAdd:
+ case Instruction::Sub:
+ case Instruction::FSub:
+ case Instruction::FNeg:
+ case Instruction::Mul:
+ case Instruction::FMul:
+ case Instruction::FDiv:
+ case Instruction::FRem:
+ case Instruction::Shl:
+ case Instruction::LShr:
+ case Instruction::AShr:
+ case Instruction::And:
+ case Instruction::Or:
+ case Instruction::Xor: {
+ Type *ResTy = inferType(R->getOperand(0));
+ if (Opcode != Instruction::FNeg) {
+ assert(ResTy == inferType(R->getOperand(1)));
+ CachedTypes[R->getOperand(1)] = ResTy;
+ }
+ return ResTy;
+ }
+ case Instruction::Freeze:
+ return inferType(R->getOperand(0));
+ default:
+ // This instruction is not vectorized by simple widening.
+ // LLVM_DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
+ llvm_unreachable("Unhandled instruction!");
+ }
+
+ return nullptr;
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenCallRecipe *R) {
+ auto &CI = *cast<CallInst>(R->getUnderlyingInstr());
+ return CI.getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenIntOrFpInductionRecipe *R) {
+ return R->getScalarType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenMemoryInstructionRecipe *R) {
+ if (R->isStore())
+ return cast<StoreInst>(&R->getIngredient())->getValueOperand()->getType();
+
+ return cast<LoadInst>(&R->getIngredient())->getType();
+}
+
+Type *VPTypeAnalysis::inferType(const VPWidenSelectRecipe *R) {
+ return inferType(R->getOperand(1));
+}
+
+Type *VPTypeAnalysis::inferType(const VPReplicateRecipe *R) {
+ switch (R->getUnderlyingInstr()->getOpcode()) {
+ case Instruction::Call: {
+ unsigned CallIdx = R->getNumOperands() - (R->isPredicated() ? 2 : 1);
+ return cast<Function>(R->getOperand(CallIdx)->getLiveInIRValue())
+ ->getReturnType();
+ }
+ case Instruction::UDiv:
+ case Instruction::SDiv:
+ case Instruction::SRem:
+ case Instruction::URem:
+ case Instruction::Add:
+ case Instruction::FAdd:
+ case Instruction::Sub:
+ case Instruction::FSub:
+ case Instruction::FNeg:
+ case Instruction::Mul:
+ case Instruction::FMul:
+ case Instruction::FDiv:
+ case Instruction::FRem:
+ case Instruction::Shl:
+ case Instruction::LShr:
+ case Instruction::AShr:
+ case Instruction::And:
+ case Instruction::Or:
+ case Instruction::Xor:
+ case Instruction::ICmp:
+ case Instruction::FCmp: {
+ Type *ResTy = inferType(R->getOperand(0));
+ assert(ResTy == inferType(R->getOperand(1)));
+ CachedTypes[R->getOperand(1)] = ResTy;
+ return ResTy;
+ }
+ case Instruction::Trunc:
+ case Instruction::SExt:
+ case Instruction::ZExt:
+ case Instruction::FPExt:
+ case Instruction::FPTrunc:
+ return R->getUnderlyingInstr()->getType();
+ case Instruction::ExtractValue: {
+ return R->getUnderlyingValue()->getType();
+ }
+ case Instruction::Freeze:
+ return inferType(R->getOperand(0));
+ case Instruction::Load:
+ return cast<LoadInst>(R->getUnderlyingInstr())->getType();
+ case Instruction::Store:
+ return cast<StoreInst>(R->getUnderlyingInstr())
+ ->getValueOperand()
+ ->getType();
+ default:
+ llvm_unreachable("Unhandled instruction");
+ }
+
+ return nullptr;
+}
+
+Type *VPTypeAnalysis::inferType(const VPValue *V) {
+ auto Iter = CachedTypes.find(V);
+ if (Iter != CachedTypes.end())
+ return Iter->second;
+
+ Type *ResultTy = nullptr;
+ if (V->isLiveIn())
+ ResultTy = V->getLiveInIRValue()->getType();
+ else {
+ const VPRecipeBase *Def = V->getDefiningRecipe();
+ switch (Def->getVPDefID()) {
+ case VPDef::VPBlendSC:
+ ResultTy = inferType(cast<VPBlendRecipe>(Def));
+ break;
+ case VPDef::VPCanonicalIVPHISC:
+ ResultTy = cast<VPCanonicalIVPHIRecipe>(Def)->getScalarType();
+ break;
+ case VPDef::VPFirstOrderRecurrencePHISC:
+ ResultTy = Def->getOperand(0)->getLiveInIRValue()->getType();
+ break;
+ case VPDef::VPInstructionSC:
+ ResultTy = inferType(cast<VPInstruction>(Def));
+ break;
+ case VPDef::VPInterleaveSC:
+ ResultTy = V->getUnderlyingValue()
+ ->getType(); // inferType(cast<VPInterleaveRecipe>(Def));
+ break;
+ case VPDef::VPPredInstPHISC:
+ ResultTy = inferType(Def->getOperand(0));
+ break;
+ case VPDef::VPReductionPHISC:
+ ResultTy = inferType(cast<VPReductionPHIRecipe>(Def));
+ break;
+ case VPDef::VPReplicateSC:
+ ResultTy = inferType(cast<VPReplicateRecipe>(Def));
+ break;
+ case VPDef::VPScalarIVStepsSC:
+ return inferType(Def->getOperand(0));
+ break;
+ case VPDef::VPWidenSC:
+ ResultTy = inferType(cast<VPWidenRecipe>(Def));
+ break;
+ case VPDef::VPWidenPHISC:
+ return inferType(Def->getOperand(0));
+ case VPDef::VPWidenPointerInductionSC:
+ return inferType(Def->getOperand(0));
+ case VPDef::VPWidenCallSC:
+ ResultTy = inferType(cast<VPWidenCallRecipe>(Def));
+ break;
+ case VPDef::VPWidenCastSC:
+ ResultTy = cast<VPWidenCastRecipe>(Def)->getResultType();
+ break;
+ case VPDef::VPWidenGEPSC:
+ ResultTy = PointerType::get(Ctx, 0);
+ break;
+ case VPDef::VPWidenIntOrFpInductionSC:
+ ResultTy = inferType(cast<VPWidenIntOrFpInductionRecipe>(Def));
+ break;
+ case VPDef::VPWidenMemory...
[truncated]
|
I am trying to mitigate the cost difference caused by removeDeadRecipes(), since legacy cost model still count them. |
Add test coverage for cost-model code-paths not covered by current unit tests in preparation for #67934.
The latest update of the PR includes computing the costs of all VPlans for their associated VFs and then picking the best one. In particular, this also now includes computing costs of replicate regions. In the initial version, the VPlan-based cost-model first tries to ask the recipe for its cost (via computeCost). If that returns an invalid cost, look up the cost via the legacy cost model. Initially I tested the latest version on a range of configurations and code-bases (llvm-test-suite + SPEC2017 + Clang bootstrap on AArch64 with and without SVE, with and without I added a number of test cases separately for loops where they disagreed before. There may be cases where the assertion gets triggered still due to missing coverage. It may also trigger in hand-written test cases that contain dead code, which VPlan transform will remove before computing the test (at the moment causing Another thing to note is that during cast-simplifications, we preserve the underlying instruction, so we can still use the legacy cost-model for the casts, as otherwise we would also need to implement costing for casts directly. This is an area where there may be some differences between legacy and VPlan-based cost-model, due to the latter having more accurate information. Going forward I think we should gradually move cost computation to the VPlan-based model and allow divergence as needed when the VPlan-based model more accurately estimates cost. |
This adds a new computeCost interface to VPReicpeBase and implements it for VPWidenRecipe and VPWidenIntOrFpInductionRecipe. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. For recipes that do not yet implement computeCost, the legacy cost for the underlying instruction is used. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, I'll let Gil/Ayal review proper.
@@ -2071,6 +2085,8 @@ class VPInterleaveRecipe : public VPRecipeBase { | |||
"Op must be an operand of the recipe"); | |||
return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op); | |||
} | |||
|
|||
Instruction *getInsertPos() const { return IG->getInsertPos(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot see where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in computeCostForRecipe
at the moment
new VPWidenCastRecipe(Instruction::CastOps(ExtOpcode), A, TruncTy); | ||
VPC->insertBefore(&R); | ||
VPValue *VPC; | ||
if (auto *UV = R.getOperand(0)->getUnderlyingValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: couldn't you just set UV to nullptr
? Or return nullptr
from getUnderlyingValue
?
Then this would just be a single call. It took me a second pass to parse the semantics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPWidenCastRecipe would need to have a single constructor accepting a (possibly nullptr) CastInst* as its last parameter, to avoid the choice below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could adjust to this effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice step forward!!
Making the last decision, namely, selecting which VPlan has the best cost, based on (partially) VPlan-based cost computation, is a good starting point, gradually allowing earlier cost-based decisions to take place along the VPlan-to-VPlan transformation pipeline.
Similar to how code-gen is simplified, modularized and kept consistent by breaking down ILV into VPlan/Region/Block/Recipe::execute() - in a gradual process which still utilizes ILV methods via VPTransformState, could compute-cost be driven by VPlan/Region/Block/Recipe::computeCost - initially utilizing CM methods internally where needed, by passing a CM* in VPCostContext? That should help keep code-gen and its cost aligned and consistent at each scope.
Adding various comments inline after a first pass.
@@ -361,6 +364,9 @@ class LoopVectorizationPlanner { | |||
/// Return the best VPlan for \p VF. | |||
VPlan &getBestPlanFor(ElementCount VF) const; | |||
|
|||
/// Return the most profitable plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: every plan contains its VF range; reduce the range of the best plan to a single value, instead of passing it alongside? Method should be const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marked as const (same as computeCost
) and updated to restrict VFs, thanks!
} | ||
|
||
VPlan *BestPlan = &*VPlans[0]; | ||
assert(hasPlanWithVF(ElementCount::getFixed(1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(hasPlanWithVF(ElementCount::getFixed(1))); | |
ElementCount ScalarVF = ElementCount::getFixed(1); | |
assert(hasPlanWithVF(ScalarVF) && "More than a single plan/VF w/o any plan having scalar VF"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -699,6 +700,14 @@ class VPLiveOut : public VPUser { | |||
#endif | |||
}; | |||
|
|||
struct VPCostContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document what this is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -841,6 +854,7 @@ class VPSingleDefRecipe : public VPRecipeBase, public VPValue { | |||
static inline bool classof(const VPRecipeBase *R) { | |||
switch (R->getVPDefID()) { | |||
case VPRecipeBase::VPDerivedIVSC: | |||
case VPRecipeBase::VPEVLBasedIVPHISC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independent fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split off to c3d2af0, thanks!
@@ -1349,6 +1363,8 @@ class VPWidenRecipe : public VPRecipeWithIRFlags { | |||
|
|||
unsigned getOpcode() const { return Opcode; } | |||
|
|||
InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: better placed slightly above, next to execute() - given that the two are closely related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks!
for (const auto &[IV, _] : Legal->getInductionVars()) { | ||
Instruction *IVInc = cast<Instruction>( | ||
IV->getIncomingValueForBlock(OrigLoop->getLoopLatch())); | ||
InstructionCost RecipeCost = CM.getInstructionCost(IVInc, VF).first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of "Recipe" may be confusing as no recipes are involved here, IVInc is an underlying Instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
IVInc->dump(); | ||
}); | ||
Cost += RecipeCost; | ||
SeenUI.insert(IVInc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SeenUI" may be confusing, it stands for both having pre-accounted for its cost here, and later whenever encountering a recipe with an underlying Instruction?
If IVInc is left for the regular scan over recipes, will its cost be computed differently than RecipeCost above?
Should reduction chains also be traversed and marked to compute their cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"SeenUI" may be confusing, it stands for both having pre-accounted for its cost here, and later whenever encountering a recipe with an underlying Instruction?
Renamed to SkipCostComputation
to helpful clarify and also removed the code to add all underlying instructions, it should not be needed.
If IVInc is left for the regular scan over recipes, will its cost be computed differently than RecipeCost above?
The reason this is done as pre-processing step is that the VPlan may not have any recipes associated with the original induction increment instruction.
Should reduction chains also be traversed and marked to compute their cost?
Done, thanks!
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getEntry()); | ||
for (VPBlockBase *Block : to_vector(vp_depth_first_shallow(Header))) { | ||
if (auto *Region = dyn_cast<VPRegionBlock>(Block)) { | ||
Cost += computeCostForReplicatorRegion(Region, VF, SeenUI, CM, CM.TTI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should ideally be a VPRegionBlock::computeCost(...) method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deliberately did not make this VPRegionBlock::computeCost(...)
, to avoid leaking/polluting the VPlan-based bits with the legacy cost-model, which may make it tempting to rely on.
VPValue *Cond = BOM->getOperand(0); | ||
|
||
// Check if Cond is a uniform compare. | ||
auto IsUniformCompare = [Cond]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserves to be more generally available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to vputils
, thanks!
IsUniformCompare || | ||
match(Cond, m_ActiveLaneMask(m_VPValue(), m_VPValue())) || | ||
match(Cond, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue())) || | ||
isa<VPActiveLaneMaskPHIRecipe>(Cond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserves to use getHeaderMask();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, there's collectAllHeaderMasks, but it only collects the compare with wide canonical IV; we would need a variant that collects the multiple specialized variants, left as is for now.
VPEVLBasedIVPHIRecipe inherits from VPSingleDefRecipe. Add VPEVLBasedIVPHISC to VPSingleDefRecipe::classof to make isa/dyn_cast & co work as expected. Split off #67934.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I forgot to mention this for completeness, but this depends on #89386
Similar to how code-gen is simplified, modularized and kept consistent by breaking down ILV into VPlan/Region/Block/Recipe::execute() - in a gradual process which still utilizes ILV methods via VPTransformState, could compute-cost be driven by VPlan/Region/Block/Recipe::computeCost - initially utilizing CM methods internally where needed, by passing a CM* in VPCostContext? That should help keep code-gen and its cost aligned and consistent at each scope.
The patch intentionally avoided making CM part of VPCostContext
, to keep a clear separation between VPlan-based and legacy costs, to avoid leaking information from the legacy cost model and avoid introducing new uses of the legacy cost model at this point.
@@ -2071,6 +2085,8 @@ class VPInterleaveRecipe : public VPRecipeBase { | |||
"Op must be an operand of the recipe"); | |||
return Op == getAddr() && !llvm::is_contained(getStoredValues(), Op); | |||
} | |||
|
|||
Instruction *getInsertPos() const { return IG->getInsertPos(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in computeCostForRecipe
at the moment
@@ -3182,6 +3198,10 @@ class VPlan { | |||
return any_of(VFs, [](ElementCount VF) { return VF.isScalable(); }); | |||
} | |||
|
|||
iterator_range<SmallSetVector<ElementCount, 2>::iterator> vectorFactors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
@@ -1349,6 +1363,8 @@ class VPWidenRecipe : public VPRecipeWithIRFlags { | |||
|
|||
unsigned getOpcode() const { return Opcode; } | |||
|
|||
InstructionCost computeCost(ElementCount VF, VPCostContext &Ctx) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved, thanks!
@@ -699,6 +700,14 @@ class VPLiveOut : public VPUser { | |||
#endif | |||
}; | |||
|
|||
struct VPCostContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
// If there is a single VPlan with a single VF, return it directly. | ||
if (VPlans.size() == 1 && size(VPlans[0]->vectorFactors()) == 1) { | ||
ElementCount VF = *VPlans[0]->vectorFactors().begin(); | ||
return {*VPlans[0], VF}; | ||
} | ||
|
||
VPlan *BestPlan = &*VPlans[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
VPValue *Cond = BOM->getOperand(0); | ||
|
||
// Check if Cond is a uniform compare. | ||
auto IsUniformCompare = [Cond]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to vputils
, thanks!
IsUniformCompare || | ||
match(Cond, m_ActiveLaneMask(m_VPValue(), m_VPValue())) || | ||
match(Cond, m_Binary<Instruction::ICmp>(m_VPValue(), m_VPValue())) || | ||
isa<VPActiveLaneMaskPHIRecipe>(Cond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, there's collectAllHeaderMasks, but it only collects the compare with wide canonical IV; we would need a variant that collects the multiple specialized variants, left as is for now.
@@ -1371,8 +1387,6 @@ class VPWidenCastRecipe : public VPRecipeWithIRFlags { | |||
ResultTy(ResultTy) { | |||
assert(UI.getOpcode() == Opcode && | |||
"opcode of underlying cast doesn't match"); | |||
assert(UI.getType() == ResultTy && | |||
"result type of underlying cast doesn't match"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, as we retain the underlying instruction in a narrower version of the cast, so we can still query the cost model for the underlying instruction, even after VP2VP narrowing. This is needed until we handle cast-costs completely in VPlan.
VPCostContext &Ctx) { | ||
VPWidenRecipe *Cur = this; | ||
// Check if the recipe is used in a reduction chain. Let the legacy cost-model | ||
// handle that case for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code removed, as reduction chain costs are pre-computed
if (auto *Next = dyn_cast<VPWidenRecipe>(*Cur->user_begin())) { | ||
Cur = Next; | ||
continue; | ||
} | ||
if (isa<VPReductionRecipe>(*Cur->user_begin())) | ||
return InstructionCost::getInvalid(); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code removed as per comment above, thanks!
I put up an alternative version with most of the logic moved the ::computeCost functions in VPlan, VPBasicBlock, VPRegionBlock in #92555 |
Ping. All pending patches landed now and I just updated this PR to current main, as well as the one with the alternative structure #92555 |
It sounds like the slightly stripped down version (no cost for VPWidenRecipe for now) is the preferred version: #92555 Closing this one here |
This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 46080ab. Extra tests have been added in 52d29eb. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. Extra tests for crashes discovered when building Chromium have been added in fb86cb7, 3be7312. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com//pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's #67647 and #67934 which is an earlier version of the current PR. PR: #92555
This reverts commit 6f538f6. A number of crashes have been fixed by separate fixes, including ttps://github.com/llvm/pull/96622. This version of the PR also pre-computes the costs for branches (except the latch) instead of computing their costs as part of costing of replicate regions, as there may not be a direct correspondence between original branches and number of replicate regions. Original message: This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This adds a new interface to compute the cost of recipes, VPBasicBlocks, VPRegionBlocks and VPlan, initially falling back to the legacy cost model for all recipes. Follow-up patches will gradually migrate recipes to compute their own costs step-by-step. It also adds getBestPlan function to LVP which computes the cost of all VPlans and picks the most profitable one together with the most profitable VF. The VPlan selected by the VPlan cost model is executed and there is an assert to catch cases where the VPlan cost model and the legacy cost model disagree. Even though I checked a number of different build configurations on AArch64 and X86, there may be some differences that have been missed. Additional discussions and context can be found in @arcbbb's llvm#67647 and llvm#67934 which is an earlier version of the current PR. PR: llvm#92555
This adds a new computeCost interface to VPReicpeBase and implements it
for VPWidenRecipe and VPWidenIntOrFpInductionRecipe.
It also adds getBestPlan function to LVP which computes the cost of all
VPlans and picks the most profitable one together with the most
profitable VF. For recipes that do not yet implement computeCost, the
legacy cost for the underlying instruction is used.
The VPlan selected by the VPlan cost model is executed and there is an
assert to catch cases where the VPlan cost model and the legacy cost
model disagree.
Builds on VPlan type inference (included in this PR as separate commit).