JIT: use reverse post-order (RPO) traversal for morph #93246

AndyAyersMS · 2023-10-09T17:54:49Z

Morph, like many JIT phases, visits all the blocks in a method by following the bbNext chain. There's a missed opportunity here to cheaply propagate some information from block to block.

For "forward" phases like morph it is often preferable to visit the blocks in reverse post-order (RPO). An RPO ensures that for most blocks all the predecessors of the block have been visited before the block itself.

Currently value numbering implements an RPO visit. It's also possible to create an RPO using fgDfsReversePostorder.

The initial goal of this work is to modify morph to rely on RPO, and then to enable a simple form of global assertion prop for Morph (aka "cross-block local assertion prop") that can push facts forward across block boundaries. #86822 has a prototype implementation. The main remaining challenges there are to make the RPO efficient and to properly handle cases where control flow is altered.

(Q4 23) Implement RPO for Morph, and enable cross-block assertion prop
- Stop allocating new basic blocks in morph
  - JIT: defer creating throw helper blocks until simple lower #93371
    - (follow-on fix) JIT: fix bug introduced by recent throw helper block changes #93897
    - (follow-on TP win) JIT: remove simple lowering #93704
    - (follow-on cleanup): JIT: clean up unused throw helpers #93948
  - JIT: create recursive tail call scratch block early in morph #93764
    - (follow-on fix) JIT: add missing recursive tail call detection logic #93892
  - JIT: stop creating gc poll blocks in morph #93816
- Come up with a strategy to ensure that the remaining control flow changes in morph are "safe" in that they do not invalidate assumptions made earlier about the flow into a block
  - The idea here is to forbid or curtail fgAddRefPred. Removing preds should be OK as it won't invalidate the RPO, but adding preds is problematic. Trying this out, one violation (given the above PRs to stop adding new blocks) is to add edges to genReturnBB. Along with those edges, IR is added to copy return values and that IR is morphed "out of sequence" -- meaning it would also be problematic for assertion prop. Seems like all that flow merging and copying could happen earlier, say in fgAddInternal; in fact there may be some benefit as perhaps some struct copies can also be merged.
    - this mostly works out, except for tail calls. If we merge all returns early, we end up creating space after the tail call for phases to add new IR; this is an attractive nuisance. Tail calls may or may not end up as BBJ_RETURN (plus BBF_HAS_JMP) or BBJ_THROW; a failed tail call will end up as BBJ_ALWAYS, and a tail call helper dispatched call likewise. So we can't merge early and we can't rely on not changing flow later.
    - The current plan is to merge all non-tail call returns early (when there is a genReturnBB) and then during morph, if there are tail calls, treat the genReturnBB as a flagged block (similar to what we propose for the tail to loop below).
    - Then enhance tail merge to merge tails for identical BBJ_RETURN blocks -- this would just fall out if we could merge all returns earlier, but since we can't, we need to do extra work here.
    - ~~Work in progress: JIT: move return merging earlier #93997~~ no longer needed?
    - Related cleanup: JIT: update tail call IR validity checks #94130
  - With that out of the way, the only other flow edge addition comes from the tail call to loop transformation. This introduces new edges into the second block of the flow graph. Since this is a back-edge no facts are going to propagate forward in a one-pass RPO, so the plan here is to flag that block as special during RPO and have the RPO walker pass the "has unvisited pred" flag to the visitor (like it would for any other loop entry). The RPO walker can then safely allow new edges to this block to appear as it is visiting the graph.
- Revise morph to visit in RPO. If we can get to the point where we're guaranteed not to have any problematic flow updates then either a static (precompute the RPO) or dynamic (evolve the RPO, ala ValueNumberState) approach should work; the latter may be preferable.
  - JIT: make global morph its own phase #94185
  - upon further analysis the dynamic approach can't work in morph, as it relies on the loop table, which is not available.
  - JIT: morph blocks in RPO #94247
- Changes to local assertion prop (see notes below for more details)
  - Revise local assertion prop to track live assertions via bit vectors:
    - JIT: revise local assertion prop to use bit vectors #94322
  - Revise local assertion prop to not clear the assertion table at start of each block, implement the assertion merging algorithm, allow cross-block assertion prop. See notes below. This will go in as off by default.
  - Set up perf lab runs and look at performance
    - Experiment: Cross Block Local Assertion Prop performance#3482
  - Work on minimizing the (bad) diffs
    - JIT: consistently handle no return calls in qmarks #94690
  - Work in reducing the TP impact.
    - JIT: Improve local assertion prop throughput #94597
    - it looks like there are some places in morph where we can consult assertions before emitting IR, in particular in places we seem to emit explicit null checks and then almost immediately remove them.
  - Enable cross-block assertion prop by default.
    - JIT: enable cross-block local assertion prop #94689
- Jakob points out in JIT: morph blocks in RPO #94247 that we can likely make the RPO more efficient in various ways, eg we likely don't need the up-front enter blocks or dom start nodes, and we can (eventually) prune away unreachable blocks.
  - JIT: stop computing dom start nodes for morph RPO #94497
- (stretch) review parts of assertion prop that are currently disabled for local assertion prop, and perhaps enable some of them
  - In particular the "edge" EQ/NEQ assertions from JTRUE are likely worth enabling. In our current setup we'd need to have two assertion out sets for BBJ_COND blocks... since we're not using bbAssertionIn or bbAssertionGen we could hijack one of those slots to hold the data. Then when looking at pred info block would have to know the pred has two sets and pick the right one. Seems quite doable.
    - JIT: enable complementary jtrue assertions for cross-block local ap #94741
    - JIT: avoid local jtrue assertions involving floating types #94935
      - Tanner points out in the above we can be a bit more aggressive if we know the values being compared are not zeros or nans.
  - is there any value in propagating addresses of locals? Jakob says it's too late already...
- (stretch) avoid visiting unreachable blocks, and drop them from the RPO (perhaps delete them eagerly)
  - I did some prototyping where morph could remove statically (pre-morph) unreachable blocks; this wasn't that impactful, and some blocks are difficult to remove (notably try entries). So maybe not worth the trouble.
  - Another idea along those lines is to just disable local assertion prop for those blocks, since it takes time and possibly generates useless assertions that steal space
- (stretch) refine RPO on the fly to avoid visiting blocks that become unreachable because of cross-block assertion prop
  - Similar to the idea above, but now for blocks that become unreachable mid-morph, because we removed edges. If a block was reachable in the original RPO, but no longer is, we can remove it rather than morph it. Similar complications ensue. I have not prototyped this.
- (stretch) do QMARK expansion before morph, remove bespoke handling in morph for QMARK assertions (see JIT: expand qmarks earlier #86778)
  - see also JIT: consistently handle no return calls in qmarks #94690 (review)
(stretch) Implement an efficient, reusable RPO traversal; look for other phases that can benefit from RPO
- JIT: Factor SSA's DFS and profile synthesis's loop finding #95251 has a reusable RPO (covering reachable blocks)
(stretch) Rework value numbering to use the reusable traversal
- Instead of this VN is now re-using the RPO generated by SSA building: JIT: Use post order computed by SSA in VN #94623

There will be extra TP cost from the RPO and from the assertion changes; the hope is that these are paid back by improved optimizations and that this entire change can be zero-cost.

Note for "backwards" phases post-order provides similar benefits (all successors likely visited before a block is visited).

cc @dotnet/jit-contrib

The text was updated successfully, but these errors were encountered:

ghost · 2023-10-09T17:54:58Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Morph, like many JIT phases, visits all the blocks in a method by following the bbNext chain. There's a missed opportunity here to cheaply propagate some information from block to block.

For "forward" phases like morph it is often preferable to visit the blocks in reverse post-order (RPO). An RPO ensures that for most blocks all the predecessors of the block have been visited before the block itself.

Currently value numbering implements an RPO visit. It's also possible to create an RPO using fgDfsReversePostorder.

The initial goal of this work is to modify morph to rely on RPO, and then to enable a simple form of global assertion prop for Morph (aka "cross-block local assertion prop") that can push facts forward across block boundaries. #86822 has a prototype implementation. The main remaining challenges there are to make the RPO efficient and to properly handle cases where control flow is altered.

There will be extra TP cost from the RPO and from the assertion changes; the hope is that these are paid back by improved optimizations and that this entire change can be zero-cost.

Note for "backwards" phases post-order provides similar benefits (all successors likely visited before a block is visited).

cc @dotnet/jit-contrib

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	9.0.0

Instead of merging returns to the common return block in morph, do all the merging in `fgAddInternal` (where we already did some merging). This removes a case where morph would add a control flow edge in a way that might disrupt an ongoing RPO. Earlier merging also opens up the possibility of tail merging some of the copies into the canonical return local, and possibly even some of the computations that feed the copies. Modify the flow alterations done by morph. Previously if a tail call was expressed via a call to a `CORINFO_TAILCALL_HELPER`, morph would change the block kind to `BBJ_RETURN` and then merge the return, changing the block kind to `BBJ_ALWAYS`. Since merging now happens before moprh, morph needs to leave the block kind alone. Generalize the post-tail-call sanity check in morph to recognize one new case that can come up. Contributes to dotnet#93246.

Adapt `fgValidateIRForTailCall` to use as a utility to verify that the JIT has not added IR that would make a tail call invalid. Currently all our cases pass this check so no tail calls are invalidated. Remove `fgCheckStmtAfterTailCall` as it did less thorough and less correct checking. Contributes to dotnet#93246.

Remove `fgCheckStmtAfterTailCall` as it did less thorough and less correct checking. Contributes to #93246.

Contributes to dotnet#93246.

AndyAyersMS · 2023-10-30T21:58:08Z

For the "dynamic RPO" -- implementing this for morph seems problematic. Value Numbering (VN) has one, but it relies on the loop table, which (roughly speaking) encodes the results of a prior DFS traversal.

VN keeps track of two sets of blocks, those not yet visited with all preds visited, and those not yet visited with some preds visited. As a visit finishes it walks the pred lists of all successors, looking to see if any of those are unvisited and have all preds visited. If so, the block is added to the first set; if not, it is added to the second.

Blocks are preferentially processed from the "all preds visited" set, until this is empty. At that point, if the "some preds visited" set is not empty, VN needs to choose a block from this set, and to do so it relies on loop structure to find an outermost loop block, to maximize the odds of subsequent visits being all preds visited cases.

Without this hint there's no guarantee that the visitation order for VN resembles an RPO.

(Aside: it also seems like VN's approach could be streamlined somewhat by keeping track of the number of unvisited preds; when a visit finishes and we're enumerating the block successors we can just decrement their waiting numbers, moving them to the zero list where appropriate, rather than scanning all their preds. But I don't recall the value number state manipulation being especially costly, so perhaps not worth the trouble?)

For morph there's no loop structure around to consult; we'd need a DFS or equivalent to establish one, at which point we may as well just use the DFS itself to establish an RPO.

AndyAyersMS · 2023-10-30T23:16:47Z

In the "dynamic" case in the absence of a loop table, you could rely on the evolving spanning tree to figure out which of the "some preds visited" set you should visit first; preferentially choose based on (a) fewer unvisited preds; then (b) closest to root visited pred (or maybe just closest to root for the pred that was visited first).

Say in a doubly nested loop both loop entries have one unvisited pred: the outer loop entry would have had preds visited before any inner loop pred, so it should go first (the outer loop entry should be a spanning tree ancestor of the inner loop entry).

This would require a bit of extra state or you could perhaps reconstruct the spanning tree from the existing data (or leverage the slots in the blocks with pre/post order numbers and do it that way, and/or manipulate state on edges or give edges an ID and track which ones are in the spanning tree via a bit vector or something).

AndyAyersMS · 2023-10-31T15:18:16Z

Thinking about it some more, it seems difficult to get the dynamic case to work properly without some kind of prior analysis of the graph structure. EG for

Once A has been processed, a dynamic RPO needs to decide if B or C should come next. The right answer is B, but both B and C are at the same depth in the spanning tree, and have the same spanning tree parent, so there's no way to use the spanning tree to decide which node should come next.

Here B has an obvious self-loop, and C has no successors, but straightforward elaborations of the graph above preserve the problem of choosing B or C without having those features.

It seems plausible that replacing the "dynamic PGO" done in value numbering with an actual RPO might both improve throughput and give better results, since not all loops are in the loop table (eg cold clones). Worth trying someday.

Contributes to #93246.

When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions). Morph does not yet take advantage of the RPO to enable more optimization. Contributes to dotnet#93246.

When optimizing, process blocks in RPO. Disallow creation of new blocks and new flow edges (the latter with certain preapproved exceptions). Morph does not yet take advantage of the RPO to enable more optimization. Contributes to #93246.

AndyAyersMS · 2023-11-02T02:21:34Z

Challenges in modifying local assertion prop:

Currently assumes any assertion in the table is valid. Instead, it needs to consult the appropriate bit vector.
In places where it does consult a bit vector, has extra checks to ensure bit vector indices remain in bounds, this is needed as we use apFull for local prop, so it can have bits set that are beyond the actual current assertion count. This will no longer be needed.
Table size is 64, refreshed for each block.
- Some blocks have more than 64 live assertion gens and we're losing them now.
- Globally some methods have more than 64 assertion gens, so when we stop clearing the table it will sometimes need to be larger than 64 (in asp.net there are cases that need 960 or so assertions). This will have a TP and memory cost. Empirically it seems like the number of local vars is a decent predictor of the table size, eg # assertions ~ 0.6 * # locals.
- In the current scheme assertions are forcibly removed from the table when killed, so current scheme can actually handle blocks with > 64 assertions, provided they're not all live at once... new scheme will not be able to remove things from the table like this as (ultimately) other blocks may be keeping that assertion live
- We need to guess the table size up front before we morph anything, and if we guess too small we'll lose some assertions. Unlike the current scheme where assertions are only lost within large basic blocks, in the new scheme we'll start to lose them no matter the block size when we reach the tail of the RPO. This may be undesirable.
- We could try and curtail the number of assertions per block like we do now and see if that helps keep from running out of table space late in the RPO, but that seems problematic given the current compaction scheme. Also note eventually we may want to enable more assertion gen (say the JTRUE edge cases...) so we need to be able to handle lots of assertions somehow.
Initial swag on TP impact doesn't look good; need to drill in here.

The plan for now:

get things correct and (near) zero diff with an apLocal bit vector tracking live assertions, reset along with the assertion table at every block boundary, and fix any serious TP issues (JIT: revise local assertion prop to use bit vectors #94322)
introduce a global table (remove the per-block table reset, but keep the per-block live assertion reset). This will introduce diffs from table sizing issues as noted above, and may also need some TP work.
record the out assertion sets on each block, and implement a block merging algorithm taking into account the special blocks that must have empty live in sets... but still clear apLocal on entry. Fix TP issues.
stop resetting the apLocal on entry, instead use the merged pred set... hopefully wins in code size and TP!

…n tracking Track the set of active local assertions via a bit vector, rather than assuming all entries in the table are live. Doing so required a number of changes in assertion prop to ensure the vector is consulted before deciding an assertion is valid. This will (eventually) allow us to propagate assertions cross-block. For now we reset the bit vector and assertion table back to empty at the start of each block so nothing propagates past the end of a block. The table can fill and cause the JIT to miss assertions in very large blocks as morph will no longer remove assertions while processing a block. Previously this would happen if there were more than 64 live assertions in a block, and now it can happen if there are more than 64 assertions in block (so somewhat more frequently). Contributes to dotnet#93246.

…n tracking (#94322) Track the set of active local assertions via a bit vector, rather than assuming all entries in the table are live. Doing so required a number of changes in assertion prop to ensure the vector is consulted before deciding an assertion is valid. This will (eventually) allow us to propagate assertions cross-block. For now we reset the bit vector and assertion table back to empty at the start of each block so nothing propagates past the end of a block. The table can fill and cause the JIT to miss assertions in very large blocks as morph will no longer remove assertions while processing a block. Previously this would happen if there were more than 64 live assertions in a block, and now it can happen if there are more than 64 assertions in block (so somewhat more frequently). Contributes to #93246.

During global morph, allow assertions to propagate to a block from the block's predecessors. Handle special cases where we can't allow this to happen: * block has preds that have not yet been morphed * block has no preds * block is specially flagged as one that might gain new preds during morph * block is an EH handler entry Contributes to dotnet#93246.

AndyAyersMS · 2023-11-07T19:03:13Z

Thoughts about how to get this to the point where it can be merged

(copied from #94363 (comment))

Throughput

I think I can claw some of the TP back by

(1) optimizing search loops in assertion prop. We should generally never search the entire table (save when adding an assertion) and instead walk the live assertion set, and for local prop we should intersect that set with the dep vector.

The bit vector operations aren't free so it might make sense to do this only when there are sufficient numbers of assertions, though that makes the code uglier. Perhaps it can all be hidden behind a suitably smart enumerator.

(2) stop morphing blocks that are unreachable or become unreachable because of local assertion prop.

Aside from removing all the morph related overhead, I suspect sometimes this dead block morphing creates issues for live blocks, eg either useless assertions that cause the table to fill faster, or else global actions (say DNER) that are hard to undo.

Currently the RPO strategy won't allow for removal of dead EH and may or may not make it easy to remove unreachable cyclic flow.

Both of these can be spun off as preliminary checkins, though the full benefits may not be seen without this change, as main has small bit vectors and isn't able to propagate constants nearly as aggressively.

At the end of the day though I expect there will still be a TP impact.

Code Size / Code Quality

With the massive numbers of diffs any sort of manual assessment is going to be a random sampling at best. I will try and look at some of the bigger regressions, possibly suppressing cloning to help remove that as a factor.

The more aggressive copy prop that this enables puts pressure on LSRA, as copies to temps provide natural split points. I am not sure we can find effective heuristics to counterbalance this.

I think it makes sense to check all this in initially as off-by-default, and enable runs in the perf lab, and use that data to both identify CQ regressions and to decide if the CQ improvements justify the additional TP costs.

AndyAyersMS · 2023-11-07T21:07:55Z

One other note here for posterity -- I spent some time trying to see if there was a way to remove assertions to try and keep the index set more compact. For example, if an assertion born in some block (created for the first time) is no longer live at the end of the block, it can be erased with somewhat minimal effort: scrub it from any dep vector, remove it from the table, and adjust the bits in the apLocal to compensate (easy if this removed assertion has a higher number than any live new assertion, a bit of work if not).

I gathered some data on this and it did not look promising. Most times assertions don't get killed within blocks; they die at merge points, and most of the time when they do get killed they're not numbered higher than any new live one. So I did not pursue this.

There is also one test case in coreclr_run that generates upwards of 40K assertions, none of which are killed. With a global table this method was regressing by 100K bytes as those assertions are useful. The fix for this was to cap the table size to the max tracked, and if there are more locals than this when morph runs, just run a non-cross block version with 64 entry table like we've been doing all along. The problematic case also has 40K locals.

It's still possible to fill the table with junk assertions; I will need to look at some data on how often we lose assertions with the new table resizing. But some of this is inevitable with a simplistic forward push of facts. One can imagine trying to prioritize locals (we have RCS_EARLY counts we could use say), but I don't know yet if something like that is worth the trouble.

We don't need quite this broad of a start node set, as the DFS will find unreachable blocks on its own. Also lay the groundwork for removing unreachable blocks, by tracking the postorder number of the last reachable block. Contributes to dotnet#93246.

During global morph, allow assertions to propagate to a block from the block's predecessors. Handle special cases where we can't allow this to happen: * block has preds that have not yet been morphed * block has no preds * block is specially flagged as one that might gain new preds during morph * block is an EH handler entry Contributes to #93246. When enabled, size the assertion table based on the number of locals, up to the tracked limit. Disabled by default; use 'DOTNET_JitEnableCrossBlockLocalAssertionProp=1` to enable.

Remove the up-front computation of enter blocks and dom start nodes from the DFS computations used for RPO. Also lay the groundwork for removing unreachable blocks, by tracking the postorder number of the last reachable block. Contributes to #93246.

Leverage the "dep vectors" to avoid the search the assertion table during local assertion prop. Helps the current (small table) behavior some, helps the future cross-block (larger table) behavior more. Similar tricks may be possible for global AP, though the set of assertions there is more varied. Avoid merging assertions from statically unreachable preds. For OSR methods ensure the original method entry is considered reachable (as it may be). Contributes to #93246.

Now that we can propagate assertions across block boundaries we can generate assertions for true/false branch conditions and propagate them along the corresponding edges. Contributes to dotnet#93246.

…94741) Now that we can propagate assertions across block boundaries we can generate assertions for true/false branch conditions and propagate them along the corresponding edges. Contributes to #93246.

Fix condition under which we can share a pred's assertion out vector. Add the ability to disable cross-block local assertion prop via range. Contributes to dotnet#93246.

Fix condition under which we can share a pred's assertion out vector. Add the ability to disable cross-block local assertion prop via range. Contributes to #93246.

Float relop equality does not imply bitwise equality. So skip making local jtrue assertions about floating types. Contributes to dotnet#93246.

Float relop equality does not imply bitwise equality. So skip making local jtrue assertions about floating types. Contributes to #93246.

JulieLeeMSFT · 2024-04-15T16:39:05Z

Completed work items for .NET 9.

AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 9, 2023

AndyAyersMS added this to the 9.0.0 milestone Oct 9, 2023

AndyAyersMS self-assigned this Oct 9, 2023

JulieLeeMSFT added the User Story A single user-facing feature. Can be grouped under an epic. label Oct 9, 2023

BruceForstall mentioned this issue Oct 10, 2023

JIT: Substitute constant into loop testing variable where possible #90622

Closed

AndyAyersMS mentioned this issue Oct 25, 2023

JIT: move return merging earlier #93997

Closed

AndyAyersMS mentioned this issue Oct 28, 2023

JIT: update tail call IR validity checks #94130

Merged

AndyAyersMS added a commit that referenced this issue Oct 30, 2023

JIT: update tail call IR validity checks (#94130)

7dba9af

Remove `fgCheckStmtAfterTailCall` as it did less thorough and less correct checking. Contributes to #93246.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Oct 30, 2023

JIT: make global morph its own phase

afd7275

Contributes to dotnet#93246.

AndyAyersMS mentioned this issue Oct 30, 2023

JIT: make global morph its own phase #94185

Merged

AndyAyersMS added a commit that referenced this issue Oct 31, 2023

JIT: make global morph its own phase (#94185)

a7b0249

Contributes to #93246.

AndyAyersMS mentioned this issue Oct 31, 2023

JIT: morph blocks in RPO #94247

Merged

AndyAyersMS mentioned this issue Nov 2, 2023

JIT: revise local assertion prop to use bit vectors #94322

Merged

AndyAyersMS mentioned this issue Nov 3, 2023

JIT: cross-block local assertion prop in morph #94363

Merged

AndyAyersMS mentioned this issue Nov 7, 2023

JIT: stop computing dom start nodes for morph RPO #94497

Merged

jakobbotsch mentioned this issue Nov 11, 2023

JIT: Use post order computed by SSA in VN #94623

Merged

AndyAyersMS mentioned this issue Nov 11, 2023

JIT: Improve local assertion prop throughput #94597

Merged

AndyAyersMS mentioned this issue Nov 15, 2023

JIT: enable complementary jtrue assertions for cross-block local ap #94741

Merged

AndyAyersMS mentioned this issue Nov 16, 2023

JIT: fix cross-block local assertion prop bug; add range enable #94885

Merged

AndyAyersMS mentioned this issue Nov 17, 2023

JIT: avoid local jtrue assertions involving floating types #94935

Merged

AndyAyersMS added a commit that referenced this issue Nov 18, 2023

JIT: avoid local jtrue assertions involving floating types (#94935)

8b75cf2

Float relop equality does not imply bitwise equality. So skip making local jtrue assertions about floating types. Contributes to #93246.

JulieLeeMSFT closed this as completed Apr 15, 2024

github-actions bot locked and limited conversation to collaborators May 16, 2024

JulieLeeMSFT added this to .NET Core CodeGen Jun 5, 2024

JulieLeeMSFT moved this to Done in .NET Core CodeGen Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: use reverse post-order (RPO) traversal for morph #93246

JIT: use reverse post-order (RPO) traversal for morph #93246

AndyAyersMS commented Oct 9, 2023 •

edited by jakobbotsch

Loading

ghost commented Oct 9, 2023

AndyAyersMS commented Oct 30, 2023

AndyAyersMS commented Oct 30, 2023 •

edited

Loading

AndyAyersMS commented Oct 31, 2023 •

edited

Loading

AndyAyersMS commented Nov 2, 2023 •

edited

Loading

AndyAyersMS commented Nov 7, 2023

AndyAyersMS commented Nov 7, 2023

JulieLeeMSFT commented Apr 15, 2024

JIT: use reverse post-order (RPO) traversal for morph #93246

JIT: use reverse post-order (RPO) traversal for morph #93246

Comments

AndyAyersMS commented Oct 9, 2023 • edited by jakobbotsch Loading

ghost commented Oct 9, 2023

AndyAyersMS commented Oct 30, 2023

AndyAyersMS commented Oct 30, 2023 • edited Loading

AndyAyersMS commented Oct 31, 2023 • edited Loading

AndyAyersMS commented Nov 2, 2023 • edited Loading

AndyAyersMS commented Nov 7, 2023

Thoughts about how to get this to the point where it can be merged

Throughput

Code Size / Code Quality

AndyAyersMS commented Nov 7, 2023

JulieLeeMSFT commented Apr 15, 2024

AndyAyersMS commented Oct 9, 2023 •

edited by jakobbotsch

Loading

AndyAyersMS commented Oct 30, 2023 •

edited

Loading

AndyAyersMS commented Oct 31, 2023 •

edited

Loading

AndyAyersMS commented Nov 2, 2023 •

edited

Loading