-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. #94760
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Florian Hahn (fhahn) ChangesThis patch introduces a new ExitPhi VPInstruction which creates a phi in The VPInstruction takes 2 operands: 1) the incoming value from the In follow-up changes, it will also be used to create phis for reduction and Depends on #92651 Patch is 177.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94760.diff 38 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c7c19ef456c7c..ae62df3aed207 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -607,10 +607,6 @@ class InnerLoopVectorizer {
BasicBlock *MiddleBlock, BasicBlock *VectorHeader,
VPlan &Plan, VPTransformState &State);
- /// Create the phi node for the resume value of first order recurrences in the
- /// scalar preheader and update the users in the scalar loop.
- void fixFixedOrderRecurrence(VPLiveOut *LO, VPTransformState &State);
-
/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);
@@ -2972,22 +2968,7 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Set up the middle block terminator. Two cases:
- // 1) If we know that we must execute the scalar epilogue, emit an
- // unconditional branch.
- // 2) Otherwise, we must have a single unique exit block (due to how we
- // implement the multiple exit case). In this case, set up a conditional
- // branch from the middle block to the loop scalar preheader, and the
- // exit block. completeLoopSkeleton will update the condition to use an
- // iteration check, if required to decide whether to execute the remainder.
- BranchInst *BrInst =
- Cost->requiresScalarEpilogue(VF.isVector())
- ? BranchInst::Create(LoopScalarPreHeader)
- : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
- Builder.getTrue());
- BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
+ auto *BrInst = new UnreachableInst(LoopMiddleBlock->getContext());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
// Update dominator for loop exit. During skeleton creation, only the vector
@@ -3094,51 +3075,6 @@ void InnerLoopVectorizer::createInductionResumeValues(
}
}
-BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
- // The trip counts should be cached by now.
- Value *Count = getTripCount();
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Add a check in the middle block to see if we have completed
- // all of the iterations in the first vector loop. Three cases:
- // 1) If we require a scalar epilogue, there is no conditional branch as
- // we unconditionally branch to the scalar preheader. Do nothing.
- // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
- // Thus if tail is to be folded, we know we don't need to run the
- // remainder and we can use the previous value for the condition (true).
- // 3) Otherwise, construct a runtime check.
- if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
- !Cost->foldTailByMasking()) {
- // Here we use the same DebugLoc as the scalar loop latch terminator instead
- // of the corresponding compare because they may have ended up with
- // different line numbers and we want to avoid awkward line stepping while
- // debugging. Eg. if the compare has got a line number inside the loop.
- // TODO: At the moment, CreateICmpEQ will simplify conditions with constant
- // operands. Perform simplification directly on VPlan once the branch is
- // modeled there.
- IRBuilder<> B(LoopMiddleBlock->getTerminator());
- B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
- Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
- BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
- BI.setCondition(CmpN);
- if (hasBranchWeightMD(*ScalarLatchTerm)) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = UF * VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(BI, Weights);
- }
- }
-
-#ifdef EXPENSIVE_CHECKS
- assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-#endif
-
- return LoopVectorPreHeader;
-}
-
std::pair<BasicBlock *, Value *>
InnerLoopVectorizer::createVectorizedLoopSkeleton(
const SCEV2ValueTy &ExpandedSCEVs) {
@@ -3198,7 +3134,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(ExpandedSCEVs);
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
// Fix up external users of the induction variable. At this point, we are
@@ -3399,8 +3335,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
for (const auto &[_, LO] : to_vector(Plan.getLiveOuts())) {
if (!Legal->isFixedOrderRecurrence(LO->getPhi()))
continue;
- fixFixedOrderRecurrence(LO, State);
- Plan.removeLiveOut(LO->getPhi());
}
// Forget the original basic block.
@@ -3470,31 +3404,16 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VF.getKnownMinValue() * UF);
}
-void InnerLoopVectorizer::fixFixedOrderRecurrence(VPLiveOut *LO,
- VPTransformState &State) {
- // Extract the last vector element in the middle block. This will be the
- // initial value for the recurrence when jumping to the scalar loop.
- VPValue *VPExtract = LO->getOperand(0);
- using namespace llvm::VPlanPatternMatch;
- assert(match(VPExtract, m_VPInstruction<VPInstruction::ExtractFromEnd>(
- m_VPValue(), m_VPValue())) &&
- "FOR LiveOut expects to use an extract from end.");
- Value *ResumeScalarFOR = State.get(VPExtract, UF - 1, true);
-
- // Fix the initial value of the original recurrence in the scalar loop.
- PHINode *ScalarHeaderPhi = LO->getPhi();
- auto *InitScalarFOR =
- ScalarHeaderPhi->getIncomingValueForBlock(LoopScalarPreHeader);
- Builder.SetInsertPoint(LoopScalarPreHeader, LoopScalarPreHeader->begin());
- auto *ScalarPreheaderPhi =
- Builder.CreatePHI(ScalarHeaderPhi->getType(), 2, "scalar.recur.init");
- for (auto *BB : predecessors(LoopScalarPreHeader)) {
- auto *Incoming = BB == LoopMiddleBlock ? ResumeScalarFOR : InitScalarFOR;
- ScalarPreheaderPhi->addIncoming(Incoming, BB);
- }
- ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader,
- ScalarPreheaderPhi);
- ScalarHeaderPhi->setName("scalar.recur");
+// Helper to reorder blocks so they match the original order even after the
+// order of the predecessors changes. This is only used to avoid a number of
+// test changes due to reordering of incoming blocks in phi nodes and should be
+// removed soon, with the tests being updated.
+static void reorderIncomingBlocks(SmallVectorImpl<BasicBlock *> &Blocks,
+ BasicBlock *LoopMiddleBlock) {
+ if (Blocks.front() == LoopMiddleBlock)
+ std::swap(Blocks.front(), Blocks.back());
+ if (Blocks.size() == 3)
+ std::swap(Blocks[0], Blocks[1]);
}
void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
@@ -7388,7 +7307,9 @@ static void createAndCollectMergePhiForReduction(
// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
- for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
+ SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+ reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+ for (auto *Incoming : Blocks) {
if (Incoming == LoopMiddleBlock)
BCBlockPhi->addIncoming(FinalValue, Incoming);
else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
@@ -7459,6 +7380,21 @@ LoopVectorizationPlanner::executePlan(
std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
: State.ExpandedSCEVs);
+#ifdef EXPENSIVE_CHECKS
+ assert(DT->verify(DominatorTree::VerificationLevel::Fast));
+#endif
+
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
+
+ using namespace llvm::VPlanPatternMatch;
+ if (MiddleVPBB->begin() != MiddleVPBB->end() &&
+ match(&MiddleVPBB->back(), m_BranchOnCond(m_VPValue()))) {
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[1])
+ ->resetBlock(OrigLoop->getLoopPreheader());
+ } else
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0])
+ ->resetBlock(OrigLoop->getLoopPreheader());
// Only use noalias metadata when using memory checks guaranteeing no overlap
// across all iterations.
@@ -7539,6 +7475,18 @@ LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();
+ // Adjust branch weight of the branch in the middle block.
+ auto *MiddleTerm =
+ cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+ if (MiddleTerm->isConditional() &&
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
+ // Assume that `Count % VectorTripCount` is equally distributed.
+ unsigned TripCount = State.UF * State.VF.getKnownMinValue();
+ assert(TripCount > 0 && "trip count should not be zero");
+ const uint32_t Weights[] = {1, TripCount - 1};
+ setBranchWeights(*MiddleTerm, Weights);
+ }
+
return {State.ExpandedSCEVs, ReductionResumeValues};
}
@@ -7595,7 +7543,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
@@ -7719,8 +7667,11 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());
- DT->changeImmediateDominator(LoopScalarPreHeader,
- EPI.EpilogueIterationCountCheck);
+ if (auto *N = DT->getNode(LoopScalarPreHeader))
+ DT->changeImmediateDominator(LoopScalarPreHeader,
+ EPI.EpilogueIterationCountCheck);
+ else
+ DT->addNewBlock(LoopScalarPreHeader, EPI.EpilogueIterationCountCheck);
if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
// If there is an epilogue which must run, there's no edge from the
// middle block to exit blocks and thus no need to update the immediate
@@ -7784,7 +7735,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
{VecEpilogueIterationCountCheck,
EPI.VectorTripCount} /* AdditionalBypass */);
- return {completeLoopSkeleton(), EPResumeVal};
+ return {LoopVectorPreHeader, EPResumeVal};
}
BasicBlock *
@@ -8515,7 +8466,9 @@ static void addUsersInExitBlock(VPBasicBlock *HeaderVPBB, Loop *OrigLoop,
Value *IncomingValue =
ExitPhi.getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue, Plan);
- Plan.addLiveOut(&ExitPhi, V);
+ Plan.addLiveOut(
+ &ExitPhi, V,
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor()));
}
}
@@ -8534,9 +8487,25 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block. The skeleton vector
// loop region contains a header and latch basic blocks.
+
+ // Add a check in the middle block to see if we have completed
+ // all of the iterations in the first vector loop. Three cases:
+ // 1) If we require a scalar epilogue, there is no conditional branch as
+ // we unconditionally branch to the scalar preheader. Do nothing.
+ // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
+ // Thus if tail is to be folded, we know we don't need to run the
+ // remainder and we can use the previous value for the condition (true).
+ // 3) Otherwise, construct a runtime check.
+ bool RequiresScalarEpilogueCheck =
+ LoopVectorizationPlanner::getDecisionAndClampRange(
+ [this](ElementCount VF) {
+ return !CM.requiresScalarEpilogue(VF.isVector());
+ },
+ Range);
VPlanPtr Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
+ OrigLoop);
VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
@@ -8679,6 +8648,49 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();
+ auto *MiddleVPBB =
+ cast<VPBasicBlock>(Plan->getVectorLoopRegion()->getSingleSuccessor());
+
+ VPBasicBlock *ScalarPH = nullptr;
+ for (VPBlockBase *Succ : MiddleVPBB->getSuccessors()) {
+ auto *VPIRBB = dyn_cast<VPIRBasicBlock>(Succ);
+ if (VPIRBB && VPIRBB->getIRBasicBlock() == OrigLoop->getHeader()) {
+ ScalarPH = VPIRBB;
+ break;
+ }
+ }
+
+ if (ScalarPH) {
+ for (auto &H : HeaderVPBB->phis()) {
+ auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H);
+ if (!FOR)
+ continue;
+ VPBuilder B(ScalarPH);
+ VPBuilder MiddleBuilder;
+ // Set insert point so new recipes are inserted before terminator and
+ // condition, if there is either the former or both.
+ if (MiddleVPBB->getNumSuccessors() != 2)
+ MiddleBuilder.setInsertPoint(MiddleVPBB);
+ else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0)))
+ MiddleBuilder.setInsertPoint(
+ &*std::prev(MiddleVPBB->getTerminator()->getIterator()));
+ else
+ MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator());
+
+ // Extract the resume value and create a new VPLiveOut for it.
+ auto *Resume = MiddleBuilder.createNaryOp(
+ VPInstruction::ExtractFromEnd,
+ {FOR->getBackedgeValue(),
+ Plan->getOrAddLiveIn(
+ ConstantInt::get(Plan->getCanonicalIV()->getScalarType(), 1))},
+ {}, "vector.recur.extract");
+ auto *R =
+ B.createNaryOp(VPInstruction::ExitPhi, {Resume, FOR->getStartValue()},
+ {}, "scalar.recur.init");
+ Plan->addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), R, ScalarPH);
+ }
+ }
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -8784,7 +8796,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), true, false, OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
@@ -8993,6 +9005,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
}
}
Builder.setInsertPoint(&*LatchVPBB->begin());
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+ VPBasicBlock::iterator IP = MiddleVPBB->begin();
for (VPRecipeBase &R :
Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9101,8 +9116,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// also modeled in VPlan.
auto *FinalReductionResult = new VPInstruction(
VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
- cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
- ->appendRecipe(FinalReductionResult);
+ FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+ IP = std::next(FinalReductionResult->getIterator());
OrigExitingVPV->replaceUsesWithIf(
FinalReductionResult,
[](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f17be451e6846..4138bee310ece 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -443,11 +443,29 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
}
void VPIRBasicBlock::execute(VPTransformState *State) {
- assert(getHierarchicalPredecessors().empty() &&
- "VPIRBasicBlock cannot have predecessors at the moment");
assert(getHierarchicalSuccessors().empty() &&
"VPIRBasicBlock cannot have successors at the moment");
+ for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
+ VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
+ auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
+ BasicBlock *PredBB = State->CFG.VPBB2IRBB[PredVPBB];
+
+ assert(PredBB && "Predecessor basic-block not found building successor.");
+ auto *PredBBTerminator = PredBB->getTerminator();
+ LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
+
+ auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
+ if (TermBr) {
+ // Set each forward successor here when it is created, excluding
+ // backedges. A backward successor is set when the branch is created.
+ unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
+ assert(!TermBr->getSuccessor(idx) &&
+ "Trying to reset an existing successor block.");
+ TermBr->setSuccessor(idx, IRBB);
+ }
+ }
+
State->Builder.SetInsertPoint(getIRBasicBlock()->getTerminator());
executeRecipes(State, getIRBasicBlock());
}
@@ -479,6 +497,14 @@ void VPBasicBlock::execute(VPTransformState *State) {
// The Exit block of a loop is always set to be successor 0 of the Exiting
// block.
cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+ // Set the insert point for recipe execution in the block.
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ if (getSuccessors().size() == 1) {
+ BranchInst *Br = State->Builder.CreateBr(NewBB);
+ Br->setSuccessor(0, nullptr);
+ NewBB->getTerminator()->eraseFromParent();
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ }
State->CFG.DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
} else if (PrevVPBB && /* A */
!((SingleHPred = getSingleHierarchicalPredecessor()) &&
@@ -639,6 +665,7 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
printSuccessors(O, Indent);
}
+
#endif
static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry);
@@ -654,12 +681,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
Entry);
for (VPBlockBase *BB : RPOT) {
VPBlockBase *NewBB = BB->clone();
- for (VPBlockBase *Pred : BB->getPredecessors())
- VPBlockUtils::connectBlocks(Old2NewVPBlocks[Pred], NewBB);
-
Old2NewVPBlocks[BB] = NewBB;
}
+ for (VPBlockBase *BB : RPOT) {
+ VPBlockBase *NewBB = Old2NewVPBlocks[BB];
+ SmallVector<VPBlockBase *> NewPreds;
+ for (VPBlockBase *Pred : BB->getPredecessors()) {
+ NewPreds.push_back(Old2NewVPBlocks[Pred]);
+ }
+ NewBB->setPredecessors(NewPreds);
+ SmallVector<VPBlockBase *> NewSuccs;
+ for (VPBlockBase *Succ : BB->successors()) {
+ NewSuccs.push_back(Old2NewVPBlocks[Succ]);
+ }
+ NewBB->setSuccessors(NewSuccs);
+ }
+
#if !defined(NDEBUG)
// Verif...
[truncated]
|
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThis patch introduces a new ExitPhi VPInstruction which creates a phi in The VPInstruction takes 2 operands: 1) the incoming value from the In follow-up changes, it will also be used to create phis for reduction and Depends on #92651 Patch is 177.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94760.diff 38 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c7c19ef456c7c..ae62df3aed207 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -607,10 +607,6 @@ class InnerLoopVectorizer {
BasicBlock *MiddleBlock, BasicBlock *VectorHeader,
VPlan &Plan, VPTransformState &State);
- /// Create the phi node for the resume value of first order recurrences in the
- /// scalar preheader and update the users in the scalar loop.
- void fixFixedOrderRecurrence(VPLiveOut *LO, VPTransformState &State);
-
/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);
@@ -2972,22 +2968,7 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Set up the middle block terminator. Two cases:
- // 1) If we know that we must execute the scalar epilogue, emit an
- // unconditional branch.
- // 2) Otherwise, we must have a single unique exit block (due to how we
- // implement the multiple exit case). In this case, set up a conditional
- // branch from the middle block to the loop scalar preheader, and the
- // exit block. completeLoopSkeleton will update the condition to use an
- // iteration check, if required to decide whether to execute the remainder.
- BranchInst *BrInst =
- Cost->requiresScalarEpilogue(VF.isVector())
- ? BranchInst::Create(LoopScalarPreHeader)
- : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
- Builder.getTrue());
- BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
+ auto *BrInst = new UnreachableInst(LoopMiddleBlock->getContext());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
// Update dominator for loop exit. During skeleton creation, only the vector
@@ -3094,51 +3075,6 @@ void InnerLoopVectorizer::createInductionResumeValues(
}
}
-BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
- // The trip counts should be cached by now.
- Value *Count = getTripCount();
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Add a check in the middle block to see if we have completed
- // all of the iterations in the first vector loop. Three cases:
- // 1) If we require a scalar epilogue, there is no conditional branch as
- // we unconditionally branch to the scalar preheader. Do nothing.
- // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
- // Thus if tail is to be folded, we know we don't need to run the
- // remainder and we can use the previous value for the condition (true).
- // 3) Otherwise, construct a runtime check.
- if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
- !Cost->foldTailByMasking()) {
- // Here we use the same DebugLoc as the scalar loop latch terminator instead
- // of the corresponding compare because they may have ended up with
- // different line numbers and we want to avoid awkward line stepping while
- // debugging. Eg. if the compare has got a line number inside the loop.
- // TODO: At the moment, CreateICmpEQ will simplify conditions with constant
- // operands. Perform simplification directly on VPlan once the branch is
- // modeled there.
- IRBuilder<> B(LoopMiddleBlock->getTerminator());
- B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
- Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
- BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
- BI.setCondition(CmpN);
- if (hasBranchWeightMD(*ScalarLatchTerm)) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = UF * VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(BI, Weights);
- }
- }
-
-#ifdef EXPENSIVE_CHECKS
- assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-#endif
-
- return LoopVectorPreHeader;
-}
-
std::pair<BasicBlock *, Value *>
InnerLoopVectorizer::createVectorizedLoopSkeleton(
const SCEV2ValueTy &ExpandedSCEVs) {
@@ -3198,7 +3134,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(ExpandedSCEVs);
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
// Fix up external users of the induction variable. At this point, we are
@@ -3399,8 +3335,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
for (const auto &[_, LO] : to_vector(Plan.getLiveOuts())) {
if (!Legal->isFixedOrderRecurrence(LO->getPhi()))
continue;
- fixFixedOrderRecurrence(LO, State);
- Plan.removeLiveOut(LO->getPhi());
}
// Forget the original basic block.
@@ -3470,31 +3404,16 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VF.getKnownMinValue() * UF);
}
-void InnerLoopVectorizer::fixFixedOrderRecurrence(VPLiveOut *LO,
- VPTransformState &State) {
- // Extract the last vector element in the middle block. This will be the
- // initial value for the recurrence when jumping to the scalar loop.
- VPValue *VPExtract = LO->getOperand(0);
- using namespace llvm::VPlanPatternMatch;
- assert(match(VPExtract, m_VPInstruction<VPInstruction::ExtractFromEnd>(
- m_VPValue(), m_VPValue())) &&
- "FOR LiveOut expects to use an extract from end.");
- Value *ResumeScalarFOR = State.get(VPExtract, UF - 1, true);
-
- // Fix the initial value of the original recurrence in the scalar loop.
- PHINode *ScalarHeaderPhi = LO->getPhi();
- auto *InitScalarFOR =
- ScalarHeaderPhi->getIncomingValueForBlock(LoopScalarPreHeader);
- Builder.SetInsertPoint(LoopScalarPreHeader, LoopScalarPreHeader->begin());
- auto *ScalarPreheaderPhi =
- Builder.CreatePHI(ScalarHeaderPhi->getType(), 2, "scalar.recur.init");
- for (auto *BB : predecessors(LoopScalarPreHeader)) {
- auto *Incoming = BB == LoopMiddleBlock ? ResumeScalarFOR : InitScalarFOR;
- ScalarPreheaderPhi->addIncoming(Incoming, BB);
- }
- ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader,
- ScalarPreheaderPhi);
- ScalarHeaderPhi->setName("scalar.recur");
+// Helper to reorder blocks so they match the original order even after the
+// order of the predecessors changes. This is only used to avoid a number of
+// test changes due to reordering of incoming blocks in phi nodes and should be
+// removed soon, with the tests being updated.
+static void reorderIncomingBlocks(SmallVectorImpl<BasicBlock *> &Blocks,
+ BasicBlock *LoopMiddleBlock) {
+ if (Blocks.front() == LoopMiddleBlock)
+ std::swap(Blocks.front(), Blocks.back());
+ if (Blocks.size() == 3)
+ std::swap(Blocks[0], Blocks[1]);
}
void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
@@ -7388,7 +7307,9 @@ static void createAndCollectMergePhiForReduction(
// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
- for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
+ SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+ reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+ for (auto *Incoming : Blocks) {
if (Incoming == LoopMiddleBlock)
BCBlockPhi->addIncoming(FinalValue, Incoming);
else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
@@ -7459,6 +7380,21 @@ LoopVectorizationPlanner::executePlan(
std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
: State.ExpandedSCEVs);
+#ifdef EXPENSIVE_CHECKS
+ assert(DT->verify(DominatorTree::VerificationLevel::Fast));
+#endif
+
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
+
+ using namespace llvm::VPlanPatternMatch;
+ if (MiddleVPBB->begin() != MiddleVPBB->end() &&
+ match(&MiddleVPBB->back(), m_BranchOnCond(m_VPValue()))) {
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[1])
+ ->resetBlock(OrigLoop->getLoopPreheader());
+ } else
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0])
+ ->resetBlock(OrigLoop->getLoopPreheader());
// Only use noalias metadata when using memory checks guaranteeing no overlap
// across all iterations.
@@ -7539,6 +7475,18 @@ LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();
+ // Adjust branch weight of the branch in the middle block.
+ auto *MiddleTerm =
+ cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+ if (MiddleTerm->isConditional() &&
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
+ // Assume that `Count % VectorTripCount` is equally distributed.
+ unsigned TripCount = State.UF * State.VF.getKnownMinValue();
+ assert(TripCount > 0 && "trip count should not be zero");
+ const uint32_t Weights[] = {1, TripCount - 1};
+ setBranchWeights(*MiddleTerm, Weights);
+ }
+
return {State.ExpandedSCEVs, ReductionResumeValues};
}
@@ -7595,7 +7543,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
@@ -7719,8 +7667,11 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());
- DT->changeImmediateDominator(LoopScalarPreHeader,
- EPI.EpilogueIterationCountCheck);
+ if (auto *N = DT->getNode(LoopScalarPreHeader))
+ DT->changeImmediateDominator(LoopScalarPreHeader,
+ EPI.EpilogueIterationCountCheck);
+ else
+ DT->addNewBlock(LoopScalarPreHeader, EPI.EpilogueIterationCountCheck);
if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
// If there is an epilogue which must run, there's no edge from the
// middle block to exit blocks and thus no need to update the immediate
@@ -7784,7 +7735,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
{VecEpilogueIterationCountCheck,
EPI.VectorTripCount} /* AdditionalBypass */);
- return {completeLoopSkeleton(), EPResumeVal};
+ return {LoopVectorPreHeader, EPResumeVal};
}
BasicBlock *
@@ -8515,7 +8466,9 @@ static void addUsersInExitBlock(VPBasicBlock *HeaderVPBB, Loop *OrigLoop,
Value *IncomingValue =
ExitPhi.getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue, Plan);
- Plan.addLiveOut(&ExitPhi, V);
+ Plan.addLiveOut(
+ &ExitPhi, V,
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor()));
}
}
@@ -8534,9 +8487,25 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block. The skeleton vector
// loop region contains a header and latch basic blocks.
+
+ // Add a check in the middle block to see if we have completed
+ // all of the iterations in the first vector loop. Three cases:
+ // 1) If we require a scalar epilogue, there is no conditional branch as
+ // we unconditionally branch to the scalar preheader. Do nothing.
+ // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
+ // Thus if tail is to be folded, we know we don't need to run the
+ // remainder and we can use the previous value for the condition (true).
+ // 3) Otherwise, construct a runtime check.
+ bool RequiresScalarEpilogueCheck =
+ LoopVectorizationPlanner::getDecisionAndClampRange(
+ [this](ElementCount VF) {
+ return !CM.requiresScalarEpilogue(VF.isVector());
+ },
+ Range);
VPlanPtr Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
+ OrigLoop);
VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
@@ -8679,6 +8648,49 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();
+ auto *MiddleVPBB =
+ cast<VPBasicBlock>(Plan->getVectorLoopRegion()->getSingleSuccessor());
+
+ VPBasicBlock *ScalarPH = nullptr;
+ for (VPBlockBase *Succ : MiddleVPBB->getSuccessors()) {
+ auto *VPIRBB = dyn_cast<VPIRBasicBlock>(Succ);
+ if (VPIRBB && VPIRBB->getIRBasicBlock() == OrigLoop->getHeader()) {
+ ScalarPH = VPIRBB;
+ break;
+ }
+ }
+
+ if (ScalarPH) {
+ for (auto &H : HeaderVPBB->phis()) {
+ auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H);
+ if (!FOR)
+ continue;
+ VPBuilder B(ScalarPH);
+ VPBuilder MiddleBuilder;
+ // Set insert point so new recipes are inserted before terminator and
+ // condition, if there is either the former or both.
+ if (MiddleVPBB->getNumSuccessors() != 2)
+ MiddleBuilder.setInsertPoint(MiddleVPBB);
+ else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0)))
+ MiddleBuilder.setInsertPoint(
+ &*std::prev(MiddleVPBB->getTerminator()->getIterator()));
+ else
+ MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator());
+
+ // Extract the resume value and create a new VPLiveOut for it.
+ auto *Resume = MiddleBuilder.createNaryOp(
+ VPInstruction::ExtractFromEnd,
+ {FOR->getBackedgeValue(),
+ Plan->getOrAddLiveIn(
+ ConstantInt::get(Plan->getCanonicalIV()->getScalarType(), 1))},
+ {}, "vector.recur.extract");
+ auto *R =
+ B.createNaryOp(VPInstruction::ExitPhi, {Resume, FOR->getStartValue()},
+ {}, "scalar.recur.init");
+ Plan->addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), R, ScalarPH);
+ }
+ }
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -8784,7 +8796,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), true, false, OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
@@ -8993,6 +9005,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
}
}
Builder.setInsertPoint(&*LatchVPBB->begin());
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+ VPBasicBlock::iterator IP = MiddleVPBB->begin();
for (VPRecipeBase &R :
Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9101,8 +9116,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// also modeled in VPlan.
auto *FinalReductionResult = new VPInstruction(
VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
- cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
- ->appendRecipe(FinalReductionResult);
+ FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+ IP = std::next(FinalReductionResult->getIterator());
OrigExitingVPV->replaceUsesWithIf(
FinalReductionResult,
[](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f17be451e6846..4138bee310ece 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -443,11 +443,29 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
}
void VPIRBasicBlock::execute(VPTransformState *State) {
- assert(getHierarchicalPredecessors().empty() &&
- "VPIRBasicBlock cannot have predecessors at the moment");
assert(getHierarchicalSuccessors().empty() &&
"VPIRBasicBlock cannot have successors at the moment");
+ for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
+ VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
+ auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
+ BasicBlock *PredBB = State->CFG.VPBB2IRBB[PredVPBB];
+
+ assert(PredBB && "Predecessor basic-block not found building successor.");
+ auto *PredBBTerminator = PredBB->getTerminator();
+ LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
+
+ auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
+ if (TermBr) {
+ // Set each forward successor here when it is created, excluding
+ // backedges. A backward successor is set when the branch is created.
+ unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
+ assert(!TermBr->getSuccessor(idx) &&
+ "Trying to reset an existing successor block.");
+ TermBr->setSuccessor(idx, IRBB);
+ }
+ }
+
State->Builder.SetInsertPoint(getIRBasicBlock()->getTerminator());
executeRecipes(State, getIRBasicBlock());
}
@@ -479,6 +497,14 @@ void VPBasicBlock::execute(VPTransformState *State) {
// The Exit block of a loop is always set to be successor 0 of the Exiting
// block.
cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+ // Set the insert point for recipe execution in the block.
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ if (getSuccessors().size() == 1) {
+ BranchInst *Br = State->Builder.CreateBr(NewBB);
+ Br->setSuccessor(0, nullptr);
+ NewBB->getTerminator()->eraseFromParent();
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ }
State->CFG.DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
} else if (PrevVPBB && /* A */
!((SingleHPred = getSingleHierarchicalPredecessor()) &&
@@ -639,6 +665,7 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
printSuccessors(O, Indent);
}
+
#endif
static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry);
@@ -654,12 +681,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
Entry);
for (VPBlockBase *BB : RPOT) {
VPBlockBase *NewBB = BB->clone();
- for (VPBlockBase *Pred : BB->getPredecessors())
- VPBlockUtils::connectBlocks(Old2NewVPBlocks[Pred], NewBB);
-
Old2NewVPBlocks[BB] = NewBB;
}
+ for (VPBlockBase *BB : RPOT) {
+ VPBlockBase *NewBB = Old2NewVPBlocks[BB];
+ SmallVector<VPBlockBase *> NewPreds;
+ for (VPBlockBase *Pred : BB->getPredecessors()) {
+ NewPreds.push_back(Old2NewVPBlocks[Pred]);
+ }
+ NewBB->setPredecessors(NewPreds);
+ SmallVector<VPBlockBase *> NewSuccs;
+ for (VPBlockBase *Succ : BB->successors()) {
+ NewSuccs.push_back(Old2NewVPBlocks[Succ]);
+ }
+ NewBB->setSuccessors(NewSuccs);
+ }
+
#if !defined(NDEBUG)
// Verif...
[truncated]
|
@llvm/pr-subscribers-backend-systemz Author: Florian Hahn (fhahn) ChangesThis patch introduces a new ExitPhi VPInstruction which creates a phi in The VPInstruction takes 2 operands: 1) the incoming value from the In follow-up changes, it will also be used to create phis for reduction and Depends on #92651 Patch is 177.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94760.diff 38 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index c7c19ef456c7c..ae62df3aed207 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -607,10 +607,6 @@ class InnerLoopVectorizer {
BasicBlock *MiddleBlock, BasicBlock *VectorHeader,
VPlan &Plan, VPTransformState &State);
- /// Create the phi node for the resume value of first order recurrences in the
- /// scalar preheader and update the users in the scalar loop.
- void fixFixedOrderRecurrence(VPLiveOut *LO, VPTransformState &State);
-
/// Iteratively sink the scalarized operands of a predicated instruction into
/// the block that was created for it.
void sinkScalarOperands(Instruction *PredInst);
@@ -2972,22 +2968,7 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Set up the middle block terminator. Two cases:
- // 1) If we know that we must execute the scalar epilogue, emit an
- // unconditional branch.
- // 2) Otherwise, we must have a single unique exit block (due to how we
- // implement the multiple exit case). In this case, set up a conditional
- // branch from the middle block to the loop scalar preheader, and the
- // exit block. completeLoopSkeleton will update the condition to use an
- // iteration check, if required to decide whether to execute the remainder.
- BranchInst *BrInst =
- Cost->requiresScalarEpilogue(VF.isVector())
- ? BranchInst::Create(LoopScalarPreHeader)
- : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
- Builder.getTrue());
- BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
+ auto *BrInst = new UnreachableInst(LoopMiddleBlock->getContext());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
// Update dominator for loop exit. During skeleton creation, only the vector
@@ -3094,51 +3075,6 @@ void InnerLoopVectorizer::createInductionResumeValues(
}
}
-BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
- // The trip counts should be cached by now.
- Value *Count = getTripCount();
- Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-
- auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
- // Add a check in the middle block to see if we have completed
- // all of the iterations in the first vector loop. Three cases:
- // 1) If we require a scalar epilogue, there is no conditional branch as
- // we unconditionally branch to the scalar preheader. Do nothing.
- // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
- // Thus if tail is to be folded, we know we don't need to run the
- // remainder and we can use the previous value for the condition (true).
- // 3) Otherwise, construct a runtime check.
- if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
- !Cost->foldTailByMasking()) {
- // Here we use the same DebugLoc as the scalar loop latch terminator instead
- // of the corresponding compare because they may have ended up with
- // different line numbers and we want to avoid awkward line stepping while
- // debugging. Eg. if the compare has got a line number inside the loop.
- // TODO: At the moment, CreateICmpEQ will simplify conditions with constant
- // operands. Perform simplification directly on VPlan once the branch is
- // modeled there.
- IRBuilder<> B(LoopMiddleBlock->getTerminator());
- B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
- Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
- BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
- BI.setCondition(CmpN);
- if (hasBranchWeightMD(*ScalarLatchTerm)) {
- // Assume that `Count % VectorTripCount` is equally distributed.
- unsigned TripCount = UF * VF.getKnownMinValue();
- assert(TripCount > 0 && "trip count should not be zero");
- const uint32_t Weights[] = {1, TripCount - 1};
- setBranchWeights(BI, Weights);
- }
- }
-
-#ifdef EXPENSIVE_CHECKS
- assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-#endif
-
- return LoopVectorPreHeader;
-}
-
std::pair<BasicBlock *, Value *>
InnerLoopVectorizer::createVectorizedLoopSkeleton(
const SCEV2ValueTy &ExpandedSCEVs) {
@@ -3198,7 +3134,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(ExpandedSCEVs);
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
// Fix up external users of the induction variable. At this point, we are
@@ -3399,8 +3335,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
for (const auto &[_, LO] : to_vector(Plan.getLiveOuts())) {
if (!Legal->isFixedOrderRecurrence(LO->getPhi()))
continue;
- fixFixedOrderRecurrence(LO, State);
- Plan.removeLiveOut(LO->getPhi());
}
// Forget the original basic block.
@@ -3470,31 +3404,16 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
VF.getKnownMinValue() * UF);
}
-void InnerLoopVectorizer::fixFixedOrderRecurrence(VPLiveOut *LO,
- VPTransformState &State) {
- // Extract the last vector element in the middle block. This will be the
- // initial value for the recurrence when jumping to the scalar loop.
- VPValue *VPExtract = LO->getOperand(0);
- using namespace llvm::VPlanPatternMatch;
- assert(match(VPExtract, m_VPInstruction<VPInstruction::ExtractFromEnd>(
- m_VPValue(), m_VPValue())) &&
- "FOR LiveOut expects to use an extract from end.");
- Value *ResumeScalarFOR = State.get(VPExtract, UF - 1, true);
-
- // Fix the initial value of the original recurrence in the scalar loop.
- PHINode *ScalarHeaderPhi = LO->getPhi();
- auto *InitScalarFOR =
- ScalarHeaderPhi->getIncomingValueForBlock(LoopScalarPreHeader);
- Builder.SetInsertPoint(LoopScalarPreHeader, LoopScalarPreHeader->begin());
- auto *ScalarPreheaderPhi =
- Builder.CreatePHI(ScalarHeaderPhi->getType(), 2, "scalar.recur.init");
- for (auto *BB : predecessors(LoopScalarPreHeader)) {
- auto *Incoming = BB == LoopMiddleBlock ? ResumeScalarFOR : InitScalarFOR;
- ScalarPreheaderPhi->addIncoming(Incoming, BB);
- }
- ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader,
- ScalarPreheaderPhi);
- ScalarHeaderPhi->setName("scalar.recur");
+// Helper to reorder blocks so they match the original order even after the
+// order of the predecessors changes. This is only used to avoid a number of
+// test changes due to reordering of incoming blocks in phi nodes and should be
+// removed soon, with the tests being updated.
+static void reorderIncomingBlocks(SmallVectorImpl<BasicBlock *> &Blocks,
+ BasicBlock *LoopMiddleBlock) {
+ if (Blocks.front() == LoopMiddleBlock)
+ std::swap(Blocks.front(), Blocks.back());
+ if (Blocks.size() == 3)
+ std::swap(Blocks[0], Blocks[1]);
}
void InnerLoopVectorizer::sinkScalarOperands(Instruction *PredInst) {
@@ -7388,7 +7307,9 @@ static void createAndCollectMergePhiForReduction(
// If we are fixing reductions in the epilogue loop then we should already
// have created a bc.merge.rdx Phi after the main vector body. Ensure that
// we carry over the incoming values correctly.
- for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
+ SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+ reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+ for (auto *Incoming : Blocks) {
if (Incoming == LoopMiddleBlock)
BCBlockPhi->addIncoming(FinalValue, Incoming);
else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
@@ -7459,6 +7380,21 @@ LoopVectorizationPlanner::executePlan(
std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
: State.ExpandedSCEVs);
+#ifdef EXPENSIVE_CHECKS
+ assert(DT->verify(DominatorTree::VerificationLevel::Fast));
+#endif
+
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
+
+ using namespace llvm::VPlanPatternMatch;
+ if (MiddleVPBB->begin() != MiddleVPBB->end() &&
+ match(&MiddleVPBB->back(), m_BranchOnCond(m_VPValue()))) {
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[1])
+ ->resetBlock(OrigLoop->getLoopPreheader());
+ } else
+ cast<VPIRBasicBlock>(MiddleVPBB->getSuccessors()[0])
+ ->resetBlock(OrigLoop->getLoopPreheader());
// Only use noalias metadata when using memory checks guaranteeing no overlap
// across all iterations.
@@ -7539,6 +7475,18 @@ LoopVectorizationPlanner::executePlan(
ILV.printDebugTracesAtEnd();
+ // Adjust branch weight of the branch in the middle block.
+ auto *MiddleTerm =
+ cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+ if (MiddleTerm->isConditional() &&
+ hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
+ // Assume that `Count % VectorTripCount` is equally distributed.
+ unsigned TripCount = State.UF * State.VF.getKnownMinValue();
+ assert(TripCount > 0 && "trip count should not be zero");
+ const uint32_t Weights[] = {1, TripCount - 1};
+ setBranchWeights(*MiddleTerm, Weights);
+ }
+
return {State.ExpandedSCEVs, ReductionResumeValues};
}
@@ -7595,7 +7543,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.
- return {completeLoopSkeleton(), nullptr};
+ return {LoopVectorPreHeader, nullptr};
}
void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
@@ -7719,8 +7667,11 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
VecEpilogueIterationCountCheck,
VecEpilogueIterationCountCheck->getSinglePredecessor());
- DT->changeImmediateDominator(LoopScalarPreHeader,
- EPI.EpilogueIterationCountCheck);
+ if (auto *N = DT->getNode(LoopScalarPreHeader))
+ DT->changeImmediateDominator(LoopScalarPreHeader,
+ EPI.EpilogueIterationCountCheck);
+ else
+ DT->addNewBlock(LoopScalarPreHeader, EPI.EpilogueIterationCountCheck);
if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
// If there is an epilogue which must run, there's no edge from the
// middle block to exit blocks and thus no need to update the immediate
@@ -7784,7 +7735,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
{VecEpilogueIterationCountCheck,
EPI.VectorTripCount} /* AdditionalBypass */);
- return {completeLoopSkeleton(), EPResumeVal};
+ return {LoopVectorPreHeader, EPResumeVal};
}
BasicBlock *
@@ -8515,7 +8466,9 @@ static void addUsersInExitBlock(VPBasicBlock *HeaderVPBB, Loop *OrigLoop,
Value *IncomingValue =
ExitPhi.getIncomingValueForBlock(ExitingBB);
VPValue *V = Builder.getVPValueOrAddLiveIn(IncomingValue, Plan);
- Plan.addLiveOut(&ExitPhi, V);
+ Plan.addLiveOut(
+ &ExitPhi, V,
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor()));
}
}
@@ -8534,9 +8487,25 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block. The skeleton vector
// loop region contains a header and latch basic blocks.
+
+ // Add a check in the middle block to see if we have completed
+ // all of the iterations in the first vector loop. Three cases:
+ // 1) If we require a scalar epilogue, there is no conditional branch as
+ // we unconditionally branch to the scalar preheader. Do nothing.
+ // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
+ // Thus if tail is to be folded, we know we don't need to run the
+ // remainder and we can use the previous value for the condition (true).
+ // 3) Otherwise, construct a runtime check.
+ bool RequiresScalarEpilogueCheck =
+ LoopVectorizationPlanner::getDecisionAndClampRange(
+ [this](ElementCount VF) {
+ return !CM.requiresScalarEpilogue(VF.isVector());
+ },
+ Range);
VPlanPtr Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
+ OrigLoop);
VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
@@ -8679,6 +8648,49 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
"VPBasicBlock");
RecipeBuilder.fixHeaderPhis();
+ auto *MiddleVPBB =
+ cast<VPBasicBlock>(Plan->getVectorLoopRegion()->getSingleSuccessor());
+
+ VPBasicBlock *ScalarPH = nullptr;
+ for (VPBlockBase *Succ : MiddleVPBB->getSuccessors()) {
+ auto *VPIRBB = dyn_cast<VPIRBasicBlock>(Succ);
+ if (VPIRBB && VPIRBB->getIRBasicBlock() == OrigLoop->getHeader()) {
+ ScalarPH = VPIRBB;
+ break;
+ }
+ }
+
+ if (ScalarPH) {
+ for (auto &H : HeaderVPBB->phis()) {
+ auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H);
+ if (!FOR)
+ continue;
+ VPBuilder B(ScalarPH);
+ VPBuilder MiddleBuilder;
+ // Set insert point so new recipes are inserted before terminator and
+ // condition, if there is either the former or both.
+ if (MiddleVPBB->getNumSuccessors() != 2)
+ MiddleBuilder.setInsertPoint(MiddleVPBB);
+ else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0)))
+ MiddleBuilder.setInsertPoint(
+ &*std::prev(MiddleVPBB->getTerminator()->getIterator()));
+ else
+ MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator());
+
+ // Extract the resume value and create a new VPLiveOut for it.
+ auto *Resume = MiddleBuilder.createNaryOp(
+ VPInstruction::ExtractFromEnd,
+ {FOR->getBackedgeValue(),
+ Plan->getOrAddLiveIn(
+ ConstantInt::get(Plan->getCanonicalIV()->getScalarType(), 1))},
+ {}, "vector.recur.extract");
+ auto *R =
+ B.createNaryOp(VPInstruction::ExitPhi, {Resume, FOR->getStartValue()},
+ {}, "scalar.recur.init");
+ Plan->addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), R, ScalarPH);
+ }
+ }
+
// ---------------------------------------------------------------------------
// Transform initial VPlan: Apply previously taken decisions, in order, to
// bring the VPlan to its final state.
@@ -8784,7 +8796,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
- *PSE.getSE(), OrigLoop->getLoopPreheader());
+ *PSE.getSE(), true, false, OrigLoop);
// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
@@ -8993,6 +9005,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
}
}
Builder.setInsertPoint(&*LatchVPBB->begin());
+ VPBasicBlock *MiddleVPBB =
+ cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+ VPBasicBlock::iterator IP = MiddleVPBB->begin();
for (VPRecipeBase &R :
Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9101,8 +9116,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
// also modeled in VPlan.
auto *FinalReductionResult = new VPInstruction(
VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
- cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
- ->appendRecipe(FinalReductionResult);
+ FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+ IP = std::next(FinalReductionResult->getIterator());
OrigExitingVPV->replaceUsesWithIf(
FinalReductionResult,
[](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f17be451e6846..4138bee310ece 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -443,11 +443,29 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
}
void VPIRBasicBlock::execute(VPTransformState *State) {
- assert(getHierarchicalPredecessors().empty() &&
- "VPIRBasicBlock cannot have predecessors at the moment");
assert(getHierarchicalSuccessors().empty() &&
"VPIRBasicBlock cannot have successors at the moment");
+ for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
+ VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
+ auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
+ BasicBlock *PredBB = State->CFG.VPBB2IRBB[PredVPBB];
+
+ assert(PredBB && "Predecessor basic-block not found building successor.");
+ auto *PredBBTerminator = PredBB->getTerminator();
+ LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
+
+ auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
+ if (TermBr) {
+ // Set each forward successor here when it is created, excluding
+ // backedges. A backward successor is set when the branch is created.
+ unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
+ assert(!TermBr->getSuccessor(idx) &&
+ "Trying to reset an existing successor block.");
+ TermBr->setSuccessor(idx, IRBB);
+ }
+ }
+
State->Builder.SetInsertPoint(getIRBasicBlock()->getTerminator());
executeRecipes(State, getIRBasicBlock());
}
@@ -479,6 +497,14 @@ void VPBasicBlock::execute(VPTransformState *State) {
// The Exit block of a loop is always set to be successor 0 of the Exiting
// block.
cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+ // Set the insert point for recipe execution in the block.
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ if (getSuccessors().size() == 1) {
+ BranchInst *Br = State->Builder.CreateBr(NewBB);
+ Br->setSuccessor(0, nullptr);
+ NewBB->getTerminator()->eraseFromParent();
+ State->Builder.SetInsertPoint(NewBB->getTerminator());
+ }
State->CFG.DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
} else if (PrevVPBB && /* A */
!((SingleHPred = getSingleHierarchicalPredecessor()) &&
@@ -639,6 +665,7 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
printSuccessors(O, Indent);
}
+
#endif
static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry);
@@ -654,12 +681,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
Entry);
for (VPBlockBase *BB : RPOT) {
VPBlockBase *NewBB = BB->clone();
- for (VPBlockBase *Pred : BB->getPredecessors())
- VPBlockUtils::connectBlocks(Old2NewVPBlocks[Pred], NewBB);
-
Old2NewVPBlocks[BB] = NewBB;
}
+ for (VPBlockBase *BB : RPOT) {
+ VPBlockBase *NewBB = Old2NewVPBlocks[BB];
+ SmallVector<VPBlockBase *> NewPreds;
+ for (VPBlockBase *Pred : BB->getPredecessors()) {
+ NewPreds.push_back(Old2NewVPBlocks[Pred]);
+ }
+ NewBB->setPredecessors(NewPreds);
+ SmallVector<VPBlockBase *> NewSuccs;
+ for (VPBlockBase *Succ : BB->successors()) {
+ NewSuccs.push_back(Old2NewVPBlocks[Succ]);
+ }
+ NewBB->setSuccessors(NewSuccs);
+ }
+
#if !defined(NDEBUG)
// Verif...
[truncated]
|
232fac0
to
984e005
Compare
This patch introduces a new ExitPhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values.
984e005
to
1ea4a7c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice extension! Adding various comments.
fixFixedOrderRecurrence(LO, State); | ||
Plan.removeLiveOut(LO->getPhi()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more to remove here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, removed, thanks!
Plan.addLiveOut(&ExitPhi, V); | ||
Plan.addLiveOut( | ||
&ExitPhi, V, | ||
cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSingleSuccessor())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could V's parent be used instead of recording MiddleVPBB, when it's not a live-in?
Can V be a live-in, where ExitPhi uses multiple predecessors to select between invariant values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could V's parent be used instead of recording MiddleVPBB, when it's not a live-in?
Not at the moment I think, asV
may be defined in the vector loop region, as the required extracts aren't modeled explicitly yet (which would be placed in MiddleVPBB)
Can V be a live-in, where ExitPhi uses multiple predecessors to select between invariant values?
V
can be a live-in
auto *MiddleVPBB = | ||
cast<VPBasicBlock>(Plan->getVectorLoopRegion()->getSingleSuccessor()); | ||
|
||
VPBasicBlock *ScalarPH = nullptr; | ||
for (VPBlockBase *Succ : MiddleVPBB->getSuccessors()) { | ||
auto *VPBB = dyn_cast<VPBasicBlock>(Succ); | ||
if (VPBB && !isa<VPIRBasicBlock>(VPBB)) { | ||
ScalarPH = VPBB; | ||
break; | ||
} | ||
} | ||
|
||
if (ScalarPH) { | ||
for (auto &H : HeaderVPBB->phis()) { | ||
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H); | ||
if (!FOR) | ||
continue; | ||
VPBuilder B(ScalarPH); | ||
VPBuilder MiddleBuilder; | ||
// Set insert point so new recipes are inserted before terminator and | ||
// condition, if there is either the former or both. | ||
if (MiddleVPBB->getNumSuccessors() != 2) | ||
MiddleBuilder.setInsertPoint(MiddleVPBB); | ||
else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0))) | ||
MiddleBuilder.setInsertPoint( | ||
&*std::prev(MiddleVPBB->getTerminator()->getIterator())); | ||
else | ||
MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator()); | ||
|
||
// Extract the resume value and create a new VPLiveOut for it. | ||
auto *Resume = MiddleBuilder.createNaryOp( | ||
VPInstruction::ExtractFromEnd, | ||
{FOR->getBackedgeValue(), | ||
Plan->getOrAddLiveIn( | ||
ConstantInt::get(Plan->getCanonicalIV()->getScalarType(), 1))}, | ||
{}, "vector.recur.extract"); | ||
auto *R = | ||
B.createNaryOp(VPInstruction::ExitPhi, {Resume, FOR->getStartValue()}, | ||
{}, "scalar.recur.init"); | ||
Plan->addLiveOut(cast<PHINode>(FOR->getUnderlyingInstr()), R, ScalarPH); | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extending an already too long function - it now exceeds 280 LOC...
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H); | ||
if (!FOR) | ||
continue; | ||
VPBuilder B(ScalarPH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBuilder B(ScalarPH); | |
VPBuilder ScalarPHBuilder(ScalarPH); |
VPBuilder MiddleBuilder; | ||
// Set insert point so new recipes are inserted before terminator and | ||
// condition, if there is either the former or both. | ||
if (MiddleVPBB->getNumSuccessors() != 2) | ||
MiddleBuilder.setInsertPoint(MiddleVPBB); | ||
else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0))) | ||
MiddleBuilder.setInsertPoint( | ||
&*std::prev(MiddleVPBB->getTerminator()->getIterator())); | ||
else | ||
MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBuilder MiddleBuilder; | |
// Set insert point so new recipes are inserted before terminator and | |
// condition, if there is either the former or both. | |
if (MiddleVPBB->getNumSuccessors() != 2) | |
MiddleBuilder.setInsertPoint(MiddleVPBB); | |
else if (isa<VPInstruction>(MiddleVPBB->getTerminator()->getOperand(0))) | |
MiddleBuilder.setInsertPoint( | |
&*std::prev(MiddleVPBB->getTerminator()->getIterator())); | |
else | |
MiddleBuilder.setInsertPoint(MiddleVPBB->getTerminator()); | |
VPBuilder MiddleBuilder(MiddleVPBB); | |
// Reset insert point so new recipes are inserted before terminator and | |
// condition, if there is either the former or both. | |
if (auto *Terminator = MiddleVPBB->getTerminator()) { | |
auto *Condition = dyn_cast<VPInstruction>(Terminator->getOperand(0)); | |
assert((!Condition || Condition->parent() == MiddleVPBB) && "Condition expected in MiddleVPBB"); | |
MiddleBuilder.setInsertPoint(Condition ? Condition : Terminator); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified, thanks!
} | ||
ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader, | ||
ScalarPreheaderPhi); | ||
ScalarHeaderPhi->setName("scalar.recur"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this setting of scalar header phi's name to "scalar.recur" retained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No at the moment it is left as is. It would need to be done as post-processing, but adjusting the name of the phi in the scalar loop seems a bit inconsistent already (we don't rename any other phis or values in the scalar loop), so IMO removing it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, better to remove it in a separate preparatory patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do ,thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 88e9c56
ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader, | ||
ScalarPreheaderPhi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this incoming is set here, rather than by VPLiveIn's fixPhi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarPreheaderPhi is being generated by ResumePhi instead, and is being set here as the incoming value of ScalarHeaderPhi - w/o any vector-to-scalar extraction - so can ResumePhi (in its current use) entail a vector-to-scalar extraction (as opposed to reusing it for feeding exit block phis)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, ResumePhi won't generate any extracts, as it will be fed by the extracted scalar resume value of the FOR. Possibly more accurately to mark ResumePhi as only having its first lane used instead of vector-to-scalar. (Done in latest version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Would be good to simply indicate somehow that its dealing with a single scalar - coming in (from each pred) and going out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an isSingleScalar
helper to handle this
@@ -8635,6 +8604,49 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) { | |||
"VPBasicBlock"); | |||
RecipeBuilder.fixHeaderPhis(); | |||
|
|||
auto *MiddleVPBB = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what is about to happen next, possibly directing to VPlanTransforms::adjustFixedOrderRecurrences() for further explanation.
// Feed a resume value for every FOR from the vector loop to the scalar loop, if middle block branches to scalar preheader, by introducing ExtractFromEnd and ExitPhi recipes in each, respectively, and a VPLiveOut which uses the latter and corresponds to the scalar header.
// Start by finding out if middle block branches to scalar preheader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to separate function, with the explanation as comment, thanks!
auto *NewPhi = Builder.CreatePHI(IncomingFromOtherPred->getType(), 2, Name); | ||
BasicBlock *VPlanPred = | ||
State.CFG | ||
.VPBB2IRBB[cast<VPBasicBlock>(getParent()->getSinglePredecessor())]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since middle block is a VPIRBasicBlock, its corresponding IRBB can be obtained directly via getIRBasicBlock().
@@ -3386,33 +3380,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State, | |||
VF.getKnownMinValue() * UF); | |||
} | |||
|
|||
void InnerLoopVectorizer::fixFixedOrderRecurrence(VPLiveOut *LO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (independent): is the second part of the following comment from above along with setting of insertion point still relevant?
// Fix LCSSA phis not already fixed earlier. Extracts may need to be generated
// in the exit block, so update the builder.
State.Builder.SetInsertPoint(State.CFG.ExitBB,
State.CFG.ExitBB->getFirstNonPHIIt());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved setting the insert point to VPLiveOut based on the predecessor.
The HeaderVPBB is retrieved in a similar fashion already. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in #94760
The empty header and latch blocks can be created together with the vector loop region. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in #94760
Split off from #94760, clarify as suggested.
✅ With the latest revision this PR passed the C/C++ code formatter. |
09411ed
to
f028ab3
Compare
@@ -3343,8 +3326,6 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State, | |||
VectorLoop->getHeader(), Plan, State); | |||
} | |||
|
|||
// Fix LCSSA phis not already fixed earlier. Extracts may need to be generated | |||
// in the exit block, so update the builder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to retain the first sentence: // Fix live-out phis not already fixed earlier.
, (LCSSA holds only for exit block phis, now live-outs are fixed also for non LCSSA header phis in scalar header) and remove the following line
State.Builder.SetInsertPoint(State.CFG.ExitBB,
State.CFG.ExitBB->getFirstNonPHIIt());
?
(also good to remove the subsequent loop sinking scalar operands, when ready :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
static void addLiveOutsForFirstOrderRecurrences(VPlan &Plan) { | ||
VPRegionBlock *VectorRegion = Plan.getVectorLoopRegion(); | ||
|
||
// Start by finding out if middle block branches to scalar preheader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Start by finding out if middle block branches to scalar preheader. | |
// Start by finding out if middle block branches to scalar preheader, which is not a VPIRBasicBlock, unlike Exit block - the other possible successor of middle block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, thanks!
for (auto &H : VectorRegion->getEntryBasicBlock()->phis()) { | ||
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (auto &H : VectorRegion->getEntryBasicBlock()->phis()) { | |
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&H); | |
for (auto &HeaderPhi : VectorRegion->getEntryBasicBlock()->phis()) { | |
auto *FOR = dyn_cast<VPFirstOrderRecurrencePHIRecipe>(&HeaderPhi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed, thanks!
if (!FOR) | ||
continue; | ||
|
||
VPBuilder B(ScalarPHVPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPBuilder B(ScalarPHVPBB); | |
VPBuilder ScalarPHBuilder(ScalarPHVPBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed thanks! Also hoisted out defining the builders.
Plan.getOrAddLiveIn( | ||
ConstantInt::get(Plan.getCanonicalIV()->getScalarType(), 1))}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be clearer to pre-generate a VPValue of 1 and reuse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
/// Fixup the wrapped LCSSA phi node. This means we need to add the | ||
/// appropriate incoming value from the precessor Pred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Fixup the wrapped LCSSA phi node. This means we need to add the | |
/// appropriate incoming value from the precessor Pred. | |
/// Fix the wrapped phi node. This means adding an incoming value to exit block phi's from the vector loop via middle block (values from scalar loop already reach these phi's), and updating the value to scalar header phi's from the scalar preheader. |
Note that a phi (ResumePhi
?) could be added inside middle block for the former case, thereby having VPlan maintain LCSSA form, and ensuring parent of incoming value is always the desired predecessor. Can be done as a follow-up (or preparatory?) patch if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the wording.
Note that a phi (ResumePhi?) could be added inside middle block for the former case, thereby having VPlan maintain LCSSA form, and ensuring parent of incoming value is always the desired predecessor. Can be done as a follow-up (or preparatory?) patch if preferred.
Would be good as follow-up
/// The first operand is the incoming value from the predecessor in VPlan, | ||
/// the second operand is the incoming value for all other predecessors | ||
/// (which are currently not modeled in VPlan). | ||
ResumePhi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for renaming, worth updating the title and summary of the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title + description should be updated now
Value *IncomingFromVPlanPred = | ||
State.get(getOperand(0), Part, /* IsScalar */ true); | ||
Value *IncomingFromOtherPreds = | ||
State.get(getOperand(1), Part, /* IsScalar */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it conceivable for the scalar preheader to have a single predecessor, namely, VPlanPred/middle-block? I.e., when trip count is known to be larger that VFxUF and no (other) runtime checks are needed to bypass the vector loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that should be possible, but it should be handled correctly (generating a phi with a single incoming value from the VPlan predecessor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the second operand (FOR->getStartValue()) is redundant in this case, but exists - so getOperand(1) works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
@@ -729,6 +759,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent, | |||
case VPInstruction::ActiveLaneMask: | |||
O << "active lane mask"; | |||
break; | |||
case VPInstruction::ResumePhi: | |||
O << "exit-phi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O << "exit-phi"; | |
O << "resume-phi"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader, | ||
ScalarPreheaderPhi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarPreheaderPhi is being generated by ResumePhi instead, and is being set here as the incoming value of ScalarHeaderPhi - w/o any vector-to-scalar extraction - so can ResumePhi (in its current use) entail a vector-to-scalar extraction (as opposed to reusing it for feeding exit block phis)?
Adjusting the name of the recurrence phi in the scalar loop is a bit inconsistent, as we do not adjust any other names in the scalar loops (including other phis). Remove this adjustment in preparation for #94760 and as discussed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks!
Adding last couple of comments and thoughts.
// in the exit block, so update the builder. | ||
State.Builder.SetInsertPoint(State.CFG.ExitBB, | ||
State.CFG.ExitBB->getFirstNonPHIIt()); | ||
// Fix LCSSA phis not already fixed earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Fix LCSSA phis not already fixed earlier. | |
// Fix live-out phis not already fixed earlier. |
VPRecipeBase *DefRecipe = ExitValue->getDefiningRecipe(); | ||
auto *ExitingVPBB = DefRecipe ? DefRecipe->getParent() : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VPRecipeBase *DefRecipe = ExitValue->getDefiningRecipe(); | |
auto *ExitingVPBB = DefRecipe ? DefRecipe->getParent() : nullptr; | |
VPRecipeBase *ExitingRecipe = ExitValue->getDefiningRecipe(); | |
auto *ExitingVPBB = ExitingRecipe ? ExitingRecipe->getParent() : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted, thanks!
Value *IncomingFromVPlanPred = | ||
State.get(getOperand(0), Part, /* IsScalar */ true); | ||
Value *IncomingFromOtherPreds = | ||
State.get(getOperand(1), Part, /* IsScalar */ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the second operand (FOR->getStartValue()) is redundant in this case, but exists - so getOperand(1) works.
NewPhi->addIncoming(IncomingFromVPlanPred, VPlanPred); | ||
for (auto *OtherPred : predecessors(Builder.GetInsertBlock())) { | ||
assert(OtherPred != VPlanPred && | ||
"VPlan predecessors should not be connected yet"); | ||
NewPhi->addIncoming(IncomingFromOtherPreds, OtherPred); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
default: | ||
llvm_unreachable("Unsupported opcode for instruction"); | ||
} | ||
} | ||
|
||
bool VPInstruction::isVectorToScalar() const { | ||
return getOpcode() == VPInstruction::ExtractFromEnd || | ||
getOpcode() == VPInstruction::ResumePhi || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, superseded by isSingleScalar.
; CHECK-NEXT: Live-out i16 %for.1 = vp<[[RESUME_1_P]]> | ||
; CHECK-NEXT: Live-out i16 %for.2 = vp<[[RESUME_2_P]]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuing above follow-up thought: live-outs were first introduced in exit block only, with middle-block as its implicit predecessor, so there was no need to specify the parental VPBB/VPIRBB, which was probably introduced later anyhow.
Now that live-outs correspond to both scalar header and exit block, how about promoting them from a distinct VPLiveOut derivative of VPUser to a recipe/VPInstruction? Say, VPIRRecipe/VPIRPhi/VPIRLiveOut, which wraps a original phi in an original basic-block - the latter wrapped in a VPIRBasicBlock? Then live-outs will appear as recipes of their ir-bb when printing, can reason about their parental VPBB with its predecessor(s), and their fixPhis() would fold into the standard execute(). Note that the scalar header block is not (yet) represented as a VPIRBasicBlock (inside a VPIRLoopRegion..).
ScalarHeaderPhi->setIncomingValueForBlock(LoopScalarPreHeader, | ||
ScalarPreheaderPhi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. Would be good to simply indicate somehow that its dealing with a single scalar - coming in (from each pred) and going out.
BasicBlock *MiddleBB = State.CFG.VPBB2IRBB[MiddleVPBB]; | ||
Phi->addIncoming(State.get(ExitValue, VPIteration(State.UF - 1, Lane)), | ||
MiddleBB); | ||
VPRecipeBase *DefRecipe = ExitValue->getDefiningRecipe(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, perhaps a live-in feeding a live-out can be useful, e.g., to indicate if the loop was entered(?).
State.VF.isScalar()) && | ||
State.VF.isSingleScalar()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking if VF is 1 should stay isScalar()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, should be fixed, thanks!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/603 Here is the relevant piece of the build log for the reference:
|
The HeaderVPBB is retrieved in a similar fashion already. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in llvm#94760
The empty header and latch blocks can be created together with the vector loop region. This is in preparation for splitting up the very large tryToBuildVPlanWithVPRecipes into several distinct functions, as suggested multiple times, including in llvm#94760
Split off from llvm#94760, clarify as suggested.
Adjusting the name of the recurrence phi in the scalar loop is a bit inconsistent, as we do not adjust any other names in the scalar loops (including other phis). Remove this adjustment in preparation for llvm#94760 and as discussed there.
…llvm#94760) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on llvm#92651 PR: llvm#94760
This patch introduces a new ResumePhi VPInstruction which creates a phi in
a leaf block of a VPlan. The first use is to create the phi node for
fixed-order recurrence resume values in the scalar preheader.
The VPInstruction takes 2 operands: 1) the incoming value from the
middle-block and a default value to be used for all other incoming
blocks.
In follow-up changes, it will also be used to create phis for reduction and
induction resume values.
Depends on #92651