-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Put all CSEs into SSA #106637
JIT: Put all CSEs into SSA #106637
Conversation
This adds an SSA updater that can incrementally put locals that were added to the IR into SSA form and starts using this infrastructure from CSE. It updates CSE to use this SSA updater to put all CSEs into SSA. Previously only single-def CSEs were put into SSA, which can cause e.g. IV opts to miss out on them. The SSA updater requires all uses and definitions to be supplied. Based on the definitions it is then possible to compute the candidate blocks for phi nodes in the usual way as the iterated dominance frontier. Since we do not have liveness for the local we do not insert the phi definitions eagerly; instead, we recursively compute the reaching def for each use by walking its dominators. Walking the dominators we either expect to find a real definition, or to hit a block that is part of the iterated dominance frontier of the definitions, in which case we know we have a live phi definition that we can insert. Inserting phi definitions needs to recursively do the same thing for the phi arguments. To make this faster we could memoize the reaching SSA numbers for each block, though this is not currently done. This also requires computing the DFS tree, dominators, and dominance frontiers after RBO (which invalidates those). The DFS tree and dominators should be reusable by IV opts in most cases, though currently we invalidate them after CSE because assertion prop sometimes will change the flow graph and does not know how to invalidate the DFS tree. This can also be fixed to improve throughput. We currently do not get much use out of this because IV opts also need liveness for the IVs. More specifically it needs to know whether the IVs are live out of the loop; it should be possible to keep some breadcrumbs around for inserted locals to cheaply know this, I think.
This comment was marked as outdated.
This comment was marked as outdated.
TP diffs for benchmarks.run_pgo now look like: Base: 120833007777, Diff: 120935018667, +0.0844%
317810259 : NA : 25.72% : +0.2630% : public: void __cdecl FlowGraphDominanceFrontiers::ComputeIteratedDominanceFrontier(struct BasicBlock *, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>> *)
162099515 : NA : 13.12% : +0.1342% : public: static class FlowGraphDominanceFrontiers * __cdecl FlowGraphDominanceFrontiers::Build(class FlowGraphDominatorTree *)
39047720 : NA : 3.16% : +0.0323% : private: static void __cdecl SsaBuilder::AddNewPhiArg(class Compiler *, struct BasicBlock *, struct Statement *, struct GenTreePhi *, unsigned int, unsigned int, struct BasicBlock *)
38346141 : NA : 3.10% : +0.0317% : public: bool __cdecl Compiler::fgMorphBlockStmt(struct BasicBlock *, struct Statement *, bool)
18905583 : +52.64% : 1.53% : +0.0156% : public: virtual void __cdecl CSE_HeuristicCommon::PerformCSE(class CSE_Candidate *)
13835238 : NA : 1.12% : +0.0114% : private: static struct Statement * __cdecl SsaBuilder::InsertPhi(class Compiler *, struct BasicBlock *, unsigned int)
8753827 : +0.85% : 0.71% : +0.0072% : public: __cdecl `public: unsigned int __cdecl Compiler::fgRunDfs<class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_1>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_2>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_3>, 0>(class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_1>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_2>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_3>)'::`2'::<lambda_1>::operator()(struct BasicBlock *) const
7221209 : +0.28% : 0.58% : +0.0060% : public: void * __cdecl ArenaAllocator::allocateMemory(unsigned __int64)
7119522 : +2.54% : 0.58% : +0.0059% : public: static class FlowGraphDominatorTree * __cdecl FlowGraphDominatorTree::Build(class FlowGraphDfsTree const *)
6499831 : +7.25% : 0.53% : +0.0054% : private: bool __cdecl jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>::ensure_capacity(unsigned __int64)
4424130 : +12.56% : 0.36% : +0.0037% : private: void __cdecl JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, struct FlowEdge *, class CompAllocator, class JitHashTableBehavior>::Grow(void)
3705888 : NA : 0.30% : +0.0031% : public: static void __cdecl SsaBuilder::InsertInSsa(class Compiler *, unsigned int, class ArrayStack<struct UseDefLocation> &, class ArrayStack<struct UseDefLocation> &)
3361499 : NA : 0.27% : +0.0028% : private: struct UseDefLocation __cdecl IncrementalSsaBuilder::FindOrCreateReachingDef(struct UseDefLocation const &)
2892505 : +0.42% : 0.23% : +0.0024% : public: unsigned short __cdecl Compiler::optAddAssertion(struct Compiler::AssertionDsc *)
2651134 : +0.69% : 0.21% : +0.0022% : BasicBlock::VisitAllSuccs<`AllSuccessorEnumerator::AllSuccessorEnumerator'::`2'::<lambda_1> >
2535856 : +4.24% : 0.21% : +0.0021% : public: struct FlowEdge * __cdecl Compiler::BlockPredsWithEH(struct BasicBlock *)
1743270 : NA : 0.14% : +0.0014% : public: void __cdecl IncrementalSsaBuilder::Insert(void)
1693606 : +0.58% : 0.14% : +0.0014% : public: void __cdecl RangeCheck::MergeEdgeAssertions(unsigned int, unsigned __int64 *const &, struct Range *)
1583199 : NA : 0.13% : +0.0013% : private: bool __cdecl IncrementalSsaBuilder::FindReachingDefInBlock(struct UseDefLocation const &, struct BasicBlock *, struct UseDefLocation *)
1557652 : +1.89% : 0.13% : +0.0013% : public: void __cdecl Compiler::fgValueNumberPhiDef(struct GenTreeLclVar *, struct BasicBlock *, bool)
1349037 : +0.75% : 0.11% : +0.0011% : VisitEHSuccs<0,`AllSuccessorEnumerator::AllSuccessorEnumerator'::`2'::<lambda_1> >
1261895 : +0.82% : 0.10% : +0.0010% : `Compiler::optReachable'::`12'::<lambda_1>::operator()
-5032126 : -0.49% : 0.41% : -0.0042% : memset
-26356661 : -50.03% : 2.13% : -0.0218% : private: void __cdecl SsaBuilder::AddPhiArg(struct BasicBlock *, struct Statement *, struct GenTreePhi *, unsigned int, unsigned int, struct BasicBlock *)
-30940686 : -15.02% : 2.50% : -0.0256% : private: void __cdecl SsaBuilder::InsertPhiFunctions(void)
-36618036 : -100.00% : 2.96% : -0.0303% : public: bool __cdecl Compiler::fgMorphBlockStmt(struct BasicBlock *, struct Statement *)
-139925973 : -100.00% : 11.32% : -0.1158% : private: void __cdecl SsaBuilder::ComputeDominanceFrontiers(struct BasicBlock **, int, class JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>, class CompAllocator, class JitHashTableBehavior> *)
-322436319 : -100.00% : 26.09% : -0.2668% : private: void __cdecl SsaBuilder::ComputeIteratedDominanceFrontier(struct BasicBlock *, class JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>, class CompAllocator, class JitHashTableBehavior> const *, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>> *) So looks small enough that we can take this change without much more consideration. One thing is whether we need to memoize reaching defs in some way to avoid quadratic (or even potentially exponential) behavior. Need to think about that a bit. |
benchmarks.run_pgo seems to be one of the least affected collections. The most affected one is benchmarks.run. With some manual reconciliation of renamed functions in the diff, the detailed TP diff for it looks like: Base: 40469192366, Diff: 40551805661, +0.2041%
16221652 : +33.98% : 13.58% : +0.0401% : private: void __cdecl SsaBuilder::ComputeDominanceFrontiers(struct BasicBlock **, int, class JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>, class CompAllocator, class JitHashTableBehavior> *)
14025772 : +48.54% : 11.74% : +0.0347% : public: virtual void __cdecl CSE_HeuristicCommon::PerformCSE(class CSE_Candidate *)
11802368 : NA : 9.88% : +0.0292% : private: static void __cdecl SsaBuilder::AddNewPhiArg(class Compiler *, struct BasicBlock *, struct Statement *, struct GenTreePhi *, unsigned int, unsigned int, struct BasicBlock *)
6227765 : +2.88% : 5.21% : +0.0154% : public: __cdecl `public: unsigned int __cdecl Compiler::fgRunDfs<class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_1>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_2>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_3>, 0>(class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_1>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_2>, class `public: class FlowGraphDfsTree * __cdecl Compiler::fgComputeDfs<0>(void)'::`2'::<lambda_3>)'::`2'::<lambda_1>::operator()(struct BasicBlock *) const
4914583 : +0.60% : 4.11% : +0.0121% : public: void * __cdecl ArenaAllocator::allocateMemory(unsigned __int64)
4651869 : +6.52% : 3.89% : +0.0115% : public: static class FlowGraphDominatorTree * __cdecl FlowGraphDominatorTree::Build(class FlowGraphDfsTree const *)
4369172 : +17.68% : 3.66% : +0.0108% : private: bool __cdecl jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>::ensure_capacity(unsigned __int64)
3469086 : NA : 2.90% : +0.0086% : private: static struct Statement * __cdecl SsaBuilder::InsertPhi(class Compiler *, struct BasicBlock *, unsigned int)
2793726 : NA : 2.34% : +0.0069% : public: static void __cdecl SsaBuilder::InsertInSsa(class Compiler *, unsigned int, class ArrayStack<struct UseDefLocation> &, class ArrayStack<struct UseDefLocation> &)
2747882 : NA : 2.30% : +0.0068% : private: struct UseDefLocation __cdecl IncrementalSsaBuilder::FindOrCreateReachingDef(struct UseDefLocation const &)
2745600 : +25.93% : 2.30% : +0.0068% : private: void __cdecl JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, struct FlowEdge *, class CompAllocator, class JitHashTableBehavior>::Grow(void)
2538778 : +2.19% : 2.13% : +0.0063% : public: void __cdecl RangeCheck::MergeEdgeAssertions(unsigned int, unsigned __int64 *const &, struct Range *)
2129497 : +12.45% : 1.78% : +0.0053% : public: struct FlowEdge * __cdecl Compiler::BlockPredsWithEH(struct BasicBlock *)
1869746 : +2.27% : 1.57% : +0.0046% : BasicBlock::VisitAllSuccs<`AllSuccessorEnumerator::AllSuccessorEnumerator'::`2'::<lambda_1> >
1822996 : NA : 1.53% : +0.0045% : private: bool __cdecl IncrementalSsaBuilder::FindReachingDefInBlock(struct UseDefLocation const &, struct BasicBlock *, struct UseDefLocation *)
1790224 : NA : 1.50% : +0.0044% : public: void __cdecl IncrementalSsaBuilder::Insert(void)
1522909 : +0.56% : 1.28% : +0.0038% : public: unsigned short __cdecl Compiler::optAddAssertion(struct Compiler::AssertionDsc *)
1018306 : +2.48% : 0.85% : +0.0025% : VisitEHSuccs<0,`AllSuccessorEnumerator::AllSuccessorEnumerator'::`2'::<lambda_1> >
891863 : +2.70% : 0.75% : +0.0022% : public: struct Range __cdecl RangeCheck::ComputeRange(struct BasicBlock *, struct GenTree *, bool)
790041 : +4.09% : 0.66% : +0.0020% : public: bool __cdecl Compiler::fgMorphBlockStmt(struct BasicBlock *, struct Statement *)
717239 : +6.03% : 0.60% : +0.0018% : public: struct FlowEdge * __cdecl Compiler::BlockDominancePreds(struct BasicBlock *)
595655 : +0.18% : 0.50% : +0.0015% : memset
573612 : +2.48% : 0.48% : +0.0014% : public: void __cdecl Compiler::fgValueNumberPhiDef(struct GenTreeLclVar *, struct BasicBlock *, bool)
551275 : +31.48% : 0.46% : +0.0014% : public: struct EHblkDsc * __cdecl Compiler::ehGetBlockExnFlowDsc(struct BasicBlock *)
383941 : +0.44% : 0.32% : +0.0009% : public: void __cdecl Compiler::compInit(class ArenaAllocator *, struct CORINFO_METHOD_STRUCT_*, class ICorJitInfo *, struct CORINFO_METHOD_INFO *, struct InlineInfo *)
374654 : +2.29% : 0.31% : +0.0009% : public: struct Scev * __cdecl ScalarEvolutionContext::Simplify(struct Scev *, struct SimplificationAssumptions const &)
367471 : +0.29% : 0.31% : +0.0009% : private: void __cdecl SsaBuilder::BlockRenameVariables(struct BasicBlock *)
312764 : +2.04% : 0.26% : +0.0008% : public: bool __cdecl JitHashTable<struct GenTree *, struct JitPtrKeyFuncs<struct GenTree>, struct Scev *, class CompAllocator, class JitHashTableBehavior>::Set(struct GenTree *, struct Scev *, enum JitHashTable<struct GenTree *, struct JitPtrKeyFuncs<struct GenTree>, struct Scev *, class CompAllocator, class JitHashTableBehavior>::SetKind)
304582 : +0.07% : 0.26% : +0.0008% : public: struct GenTree * __cdecl Compiler::optAssertionProp_LclVar(unsigned __int64 *const &, struct GenTreeLclVarCommon *, struct Statement *)
297432 : +4.41% : 0.25% : +0.0007% : public: enum RelopEvaluationResult __cdecl ScalarEvolutionContext::EvaluateRelop(unsigned int)
281935 : +0.55% : 0.24% : +0.0007% : private: void __cdecl JitHashTable<struct CORINFO_FIELD_STRUCT_*, struct JitPtrKeyFuncs<struct CORINFO_FIELD_STRUCT_>, class FieldSeq, class CompAllocator, class JitHashTableBehavior>::Grow(void)
254204 : +2.33% : 0.21% : +0.0006% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::ClearD(struct BitVecTraits *, unsigned __int64 *&)
221830 : +1.38% : 0.19% : +0.0005% : private: void __cdecl SsaDefArray<class LclSsaVarDsc>::GrowArray(class CompAllocator)
220372 : +0.14% : 0.18% : +0.0005% : public: enum PhaseStatus __cdecl Compiler::optAssertionPropMain(void)
197369 : +3.85% : 0.17% : +0.0005% : public: bool __cdecl RangeCheck::ComputeDoesOverflow(struct BasicBlock *, struct GenTree *, struct Range const &)
194313 : +0.26% : 0.16% : +0.0005% : public: void __cdecl Compiler::optImpliedAssertions(unsigned short, unsigned __int64 *&)
186496 : +2.80% : 0.16% : +0.0005% : public: struct Range __cdecl RangeCheck::GetRange(struct BasicBlock *, struct GenTree *, bool)
178680 : +0.22% : 0.15% : +0.0004% : private: void __cdecl SsaBuilder::ComputeIteratedDominanceFrontier(struct BasicBlock *, class JitHashTable<struct BasicBlock *, struct JitPtrKeyFuncs<struct BasicBlock>, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>>, class CompAllocator, class JitHashTableBehavior> const *, class jitstd::vector<struct BasicBlock *, class jitstd::allocator<struct BasicBlock *>> *)
175067 : +0.24% : 0.15% : +0.0004% : public: void __cdecl DataFlow::ForwardAnalysis<class AssertionPropFlowCallback>(class AssertionPropFlowCallback &)
174825 : +0.88% : 0.15% : +0.0004% : public: void __cdecl ValueNumStore::GetConstantBoundInfo(unsigned int, struct ValueNumStore::ConstantBoundInfo *)
172806 : +0.31% : 0.14% : +0.0004% : protected: void __cdecl Compiler::compCompile(void **, unsigned int *, class JitFlags *)
172736 : +2.73% : 0.14% : +0.0004% : public: void __cdecl RangeCheck::MergeEdgeAssertions(struct GenTreeLclVarCommon *, unsigned __int64 *const &, struct Range *)
164132 : +3.16% : 0.14% : +0.0004% : public: static struct Range __cdecl RangeOps::Merge(struct Range const &, struct Range const &, bool)
162139 : +3.63% : 0.14% : +0.0004% : public: struct Range __cdecl RangeCheck::ComputeRangeForBinOp(struct BasicBlock *, struct GenTreeOp *, bool)
155473 : +0.24% : 0.13% : +0.0004% : public: void __cdecl Compiler::optImpliedByTypeOfAssertions(unsigned __int64 *&)
150968 : +0.33% : 0.13% : +0.0004% : BasicBlock::VisitAllSuccs<`Compiler::optReachable'::`12'::<lambda_1> >
147163 : +0.33% : 0.12% : +0.0004% : public: unsigned short __cdecl Compiler::optGlobalAssertionIsEqualOrNotEqualZero(unsigned __int64 *const &, struct GenTree *)
146382 : +0.76% : 0.12% : +0.0004% : public: enum PhaseStatus __cdecl Compiler::optVNBasedDeadStoreRemoval(void)
139613 : +0.29% : 0.12% : +0.0003% : `Compiler::optReachable'::`12'::<lambda_1>::operator()
138149 : +0.06% : 0.12% : +0.0003% : public: unsigned int __cdecl ValueNumStore::VNForFunc(enum var_types, enum VNFunc, unsigned int, unsigned int)
136733 : +0.65% : 0.11% : +0.0003% : `LoopLocalOccurrences::GetOrCreateMap'::`2'::<lambda_1>::operator()
130909 : +0.04% : 0.11% : +0.0003% : public: unsigned int * __cdecl JitHashTable<struct ValueNumStore::VNDefFuncApp<2>, struct ValueNumStore::VNDefFuncAppKeyFuncs<2>, unsigned int, class CompAllocator, class JitHashTableBehavior>::LookupPointerOrAdd(struct ValueNumStore::VNDefFuncApp<2>, unsigned int)
124058 : +0.16% : 0.10% : +0.0003% : public: static void __cdecl BitSetOps<unsigned __int64 *, 1, struct BitVecTraits *, struct BitVecTraits>::IntersectionD(struct BitVecTraits *, unsigned __int64 *&, unsigned __int64 *)
-374413 : -3.47% : 0.31% : -0.0009% : public: enum PhaseStatus __cdecl Compiler::fgSsaBuild(void)
-7939897 : -26.44% : 6.65% : -0.0196% : private: void __cdecl SsaBuilder::AddPhiArg(struct BasicBlock *, struct Statement *, struct GenTreePhi *, unsigned int, unsigned int, struct BasicBlock *)
-9406160 : -14.99% : 7.88% : -0.0232% : private: void __cdecl SsaBuilder::InsertPhiFunctions(void) |
src/coreclr/jit/flowgraph.cpp
Outdated
// Remarks: | ||
// Recall that the dominance frontier of a block B is the set of blocks | ||
// B3 such that there exists some B2 s.t. B3 is a successor of B2, and | ||
// B dominates B2. Note that this dominance need not be strict -- B2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// B dominates B2. Note that this dominance need not be strict -- B2 | |
// B dominates B2 but not B3. Note that this dominance need not be strict -- B2 |
return reachingDef; | ||
} | ||
|
||
if (BitVecOps::IsMember(&m_poTraits, m_iteratedDominanceFrontiers, dom->bbPostorderNum)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like if you are worried about worst-case costs and can be allowed to fail to put a local into SSA, you could see if there is any particularly large IDF and just punt if there is one (maybe?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe only handle the two-def case (not sure we ever really see more than two defs for a CSE)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like if you are worried about worst-case costs and can be allowed to fail to put a local into SSA, you could see if there is any particularly large IDF and just punt if there is one (maybe?).
Yeah that's a good point, the number of recursive calls is bounded by the IDF size. I just dislike making the updater partial in this way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with your suggestion and skip the insertion into SSA if the IDF size is > 100. However, I didn't see a need to restrict the number of defs we allow in the inserter.
The problem here was the fact that I added the Still have some other regressions to investigate. |
cc @dotnet/jit-contrib PTAL @AndyAyersMS -- this now computes liveness for SSA-inserted locals, see my comment above. Can you review those new parts? I went through a bunch of ASP.NET regressions and they fell into two categories:
Sometimes strength reduction is also a size regression while being a perfscore regression, but there is nothing new there. Some stats for number of strength reductions in win-x64 collections with this PR: |
This seems to be more expensive for arm64, likely because there are more CSEs that are now having their liveness computed. |
m_queue.Reset(); | ||
m_queue.Push(use.Block); | ||
|
||
while (!m_queue.Empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a check that you actually do find the def block, and/or don't try to search past the first block? (that is, def dominates use)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we necessarily find the reaching def (we might hit a block marked live-in already due to another use), but makes sense to add an assert that we don't get to the first block. I added that assert into AddInsertedSsaLiveIn
since it never makes sense to mark these added locals as live into the first BB.
This adds an SSA updater that can incrementally put locals that were added to the IR into SSA form and starts using this infrastructure from CSE. Previously only single-def CSEs were put into SSA, which can cause e.g. IV opts to miss out on them.
The SSA updater requires all uses and definitions to be supplied. Based on the definitions it is then possible to compute the candidate blocks for phi nodes in the usual way as the iterated dominance frontier. Since we do not have liveness for the local we do not insert the phi definitions eagerly; instead, we recursively compute the reaching def for each use by walking its dominators.
Walking the dominators we either expect to find a real definition, or to hit a block that is part of the iterated dominance frontier of the definitions, in which case we know we have a live phi definition that we can insert.
Inserting phi definitions needs to recursively do the same thing for the phi arguments. To make this faster we could memoize the reaching SSA numbers for each block, though this is not currently done.
This also requires computing the DFS tree, dominators, and dominance frontiers after RBO (which invalidates those). Since the DFS tree and dominator tree is usually computed anyway by IV opts, this is not that costly.
The cost is further reduced by avoiding any of these computations in the single-def case (which CSE already handled). This special case is also moved into the SSA updater.
Finally, since IV opts need liveness to work, the SSA inserter also computes liveness for the inserted locals. This is done by marking all paths from each use back to its reaching definition as having that local live-in.
Fix #109412