-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[Wasm RyuJIT] Wasm control flow basics #121417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Determine how to emit Wasm control flow from the JIT's control flow graph. Relies on loop-aware RPO to determine the block order. Currently only handles the main method. Assumes irreducible loops have been fixed upstream (which is not yet guaranteed; bails out if not so). Doesn't actually do any emission, just prints a textual description in the JIT dump (along with a dot markup version). Uses only LOOP and BLOCK. Tries to limit the extent of BLOCK. Run for now as an optional phase even if not targeting Wasm, to do some stress testing. Contributes to dotnet#121178
|
FYI @dotnet/jit-contrib @SingleAccretion a first cut at wasm control flow. Eventually the text emission will end up being part of emitwasm (and in binary, with text disasm) and not part of this phase. Left: sample control flow graph. Black are Wasm blocks, red are Wasm loops. Right: sample Wasm instruction stream (block contents not shown, just |
|
The FALLTHROUGH-inv above is a reminder we need to invert the branch condition (not yet done). |
|
Think this is ready for review. There could still be bugs in the control flow, but hard to be sure until we can actually generate code. Errors seem to be all known or timeouts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new experimental Wasm control flow simulation phase to the JIT compiler. The phase analyzes the control flow graph and simulates how Wasm BLOCK/END and LOOP/END control structures would be emitted for the method being compiled.
- Introduces a new
fgWasmControlFlowphase that transforms the control flow graph into Wasm-style nested control structures - Adds a
JitWasmControlFlowconfiguration option to enable/disable this experimental feature (DEBUG builds only) - Implements an interval-based algorithm to properly nest blocks and loops according to Wasm requirements
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/jitconfigvalues.h | Adds the JitWasmControlFlow configuration integer to control the new phase |
| src/coreclr/jit/fgwasm.cpp | New file implementing the Wasm control flow analysis and simulation algorithm |
| src/coreclr/jit/compphases.h | Registers the PHASE_WASM_CONTROL_FLOW phase |
| src/coreclr/jit/compmemkind.h | Adds CMK_Wasm memory kind for allocations |
| src/coreclr/jit/compiler.h | Declares the fgWasmControlFlow method |
| src/coreclr/jit/compiler.cpp | Invokes the new phase conditionally in DEBUG builds after loop alignment |
| src/coreclr/jit/CMakeLists.txt | Adds fgwasm.cpp to the build sources |
|
@dotnet/jit-contrib PTAL |
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
|
@dotnet/jit-contrib ping |
| CompMemKindMacro(TryRegionClone) | ||
| CompMemKindMacro(Async) | ||
| CompMemKindMacro(RangeCheckCloning) | ||
| CompMemKindMacro(Wasm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| CompMemKindMacro(Wasm) | |
| CompMemKindMacro(WasmCfgLowering) |
Wasm seems like too broad a category.
| // b must remain fixed but we can increase a as needed to accomplish nesting. | ||
| // For switches we will create multiple [a,b0], [a, b1]... | ||
| // | ||
| // If a forward interval ends on a block that already has an interval, we can ignore it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // If a forward interval ends on a block that already has an interval, we can ignore it. | |
| // If a forward interval ends on a block that already has an interval ending at it, we can ignore it. |
| // tree for the main method and each funclet, and the layout below will properly transform them all into | ||
| // Wasm control flow. | ||
| // | ||
| unsigned numHotBlocks = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
| unsigned numHotBlocks = 0; | |
| unsigned numDfsBlocks = 0; |
The blocks we're missing here are the throw helper ones which represent implicit flow. I think we usually don't usually refer to them as simply "cold" (as in hot/cold splitting, or zero weights).
It may prove easier to emit the branches to them 'inline' in codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point... let me make a note about properly handling throw helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets renamed in the sequel, so I'll hold off renaming it here.
| // Because we're walking front to back, we will have already recorded an interval that starts | ||
| // earlier. | ||
| // | ||
| // We then scan the in non-decreasing start order, finding earlier intervals that contain the start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // We then scan the in non-decreasing start order, finding earlier intervals that contain the start | |
| // We then scan the intervals in non-decreasing start order, finding earlier intervals that contain the start |
| // | ||
| // We then scan the in non-decreasing start order, finding earlier intervals that contain the start | ||
| // of the current interval but not the end. When we find one, the start of the current interval will | ||
| // need to increase so the earlier interval can nest inside. That is, if have a:[0, 4] and b:[2,6] we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // need to increase so the earlier interval can nest inside. That is, if have a:[0, 4] and b:[2,6] we | |
| // need to decrease so the earlier interval can nest inside. That is, if have a:[0, 4] and b:[2,6] we |
?
| // merged with (2) above. | ||
| // | ||
| auto resolve = [&intervals](WasmInterval* const iv) { | ||
| for (WasmInterval* ip : intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this has quadratic complexity, does it need another TODO for some sort of cutoff, or some other way to avoid pathological cases (maybe the constant factor is small enough)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to find all the conflicts, so there's no safe way to bail out.
If we maintain a data structure with all intervals ordered by end we could perhaps efficiently search it to minimize where to check for conflicts but maintaining that ordering seems like it might be more expensive overall.
I suspect the loop-aware RPO helps us avoid pathological cases but don't have any actual proof.
| { | ||
| // We only need to consider intervals that start at the same point or earlier. | ||
| // | ||
| if (ip == iv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: it is surprisingly difficult on the eye to distinguish between ip/iv/ic. They may benefit from longer names.
| // We ignore these and treat them as if they fall through to the tail (if there is a tail). | ||
| // Since the tail cannot be a join we don't need a block. | ||
| // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the successor of these be the BBJ_CALLFINALLYRET to get the behavior above? I was a bit surprised to see that the logic below doesn't consider that block the successor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be handled that way (and is done so and more cleanly in the follow-up PR), but since the BBJ_CALLFINALYRET is guaranteed to also be Next() and is logically single-pred, there is no need for a block here, so ignoring it works out the same way.
| fgVisitBlocksInLoopAwareRPO(m_dfsTree, m_loops, addToSequence); | ||
|
|
||
| // Splice in a fake BB0 | ||
| // | ||
| BasicBlock bb0; | ||
| INDEBUG(bb0.bbNum = 0;); | ||
| bb0.bbPreorderNum = numHotBlocks; | ||
| bb0.bbPostorderNum = m_dfsTree->GetPostOrderCount(); | ||
| initialLayout[numHotBlocks] = &bb0; | ||
|
|
||
| // ----------------------------------------------- | ||
| // (2) Build the intervals | ||
| // | ||
| // Allocate interval and scratch vectors. We'll use the scratch vector to keep track of | ||
| // block intervals that end at a certain point. | ||
| // | ||
| jitstd::vector<WasmInterval*> intervals(getAllocator(CMK_Wasm)); | ||
| jitstd::vector<WasmInterval*> scratch(numHotBlocks, nullptr, getAllocator(CMK_Wasm)); | ||
|
|
||
| for (unsigned int cursor = 0; cursor < numHotBlocks; cursor++) | ||
| { | ||
| BasicBlock* const block = initialLayout[cursor]; | ||
|
|
||
| // See if we entered any loops | ||
| // | ||
| FlowGraphNaturalLoop* const loop = m_loops->GetLoopByHeader(block); | ||
|
|
||
| if (loop != nullptr) | ||
| { | ||
| // Find the loop's lexical extent given our ordering | ||
| // (maybe memoize this during loop finding...) | ||
| // | ||
| // Note that cursor may end up pointing at BB0 | ||
| // | ||
| unsigned endCursor = cursor; | ||
| while ((endCursor < numHotBlocks) && loop->ContainsBlock(initialLayout[endCursor])) | ||
| { | ||
| endCursor++; | ||
| } | ||
|
|
||
| WasmInterval* const loopInterval = WasmInterval::NewLoop(this, block, initialLayout[endCursor]); | ||
|
|
||
| // We assume here that a block is only the header of one loop. | ||
| // | ||
| intervals.push_back(loopInterval); | ||
| } | ||
|
|
||
| // Now see where block branches to... | ||
| // | ||
| if (block->KindIs(BBJ_CALLFINALLY)) | ||
| { | ||
| // We ignore these and treat them as if they fall through to the tail (if there is a tail). | ||
| // Since the tail cannot be a join we don't need a block. | ||
| // | ||
| continue; | ||
| } | ||
|
|
||
| for (BasicBlock* const succ : block->Succs()) | ||
| { | ||
| unsigned const succNum = succ->bbPreorderNum; | ||
|
|
||
| // We ignore back edges; they don't inspire blocks. | ||
| // | ||
| if (succNum <= cursor) | ||
| { | ||
| JITDUMP("Backedge " FMT_BB "[%u] -> " FMT_BB "[%u]\n", block->bbNum, cursor, succ->bbNum, succNum); | ||
|
|
||
| // The backedge target should be a loop header. | ||
| // (TODO: scan loop stack to verify the loop is on the stack?) | ||
| // | ||
| // Note we currently bail out way above if there are any irreducible loops. | ||
| // | ||
| assert(m_loops->GetLoopByHeader(succ) != nullptr); | ||
| continue; | ||
| } | ||
|
|
||
| // Branch to next needs no block, unless this is a switch | ||
| // (eventually when we leave the default on the switch we can remove this). | ||
| // | ||
| if ((succNum == (cursor + 1)) && !block->KindIs(BBJ_SWITCH)) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // Branch to cold block needs no block (presumably something EH related). | ||
| // Eventually we need to case these out and handle them better. | ||
| // | ||
| if (succNum >= numHotBlocks) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // See if we already have a block that ends at this point and starts before. | ||
| // | ||
| WasmInterval* const existingBlock = scratch[succNum]; | ||
|
|
||
| if (existingBlock != nullptr) | ||
| { | ||
| // If so we don't need to track this branch. | ||
| // | ||
| JITDUMP("Subsumed " FMT_BB "[%u] -> " FMT_BB "[%u]\n", block->bbNum, cursor, succ->bbNum, succNum); | ||
| assert(existingBlock->Start() <= cursor); | ||
| continue; | ||
| } | ||
|
|
||
| // Non-contiguous, non-subsumed forward branch | ||
| // | ||
| WasmInterval* const branch = WasmInterval::NewBlock(this, block, initialLayout[succNum]); | ||
| intervals.push_back(branch); | ||
|
|
||
| // Remember an interval end here | ||
| // | ||
| scratch[succNum] = branch; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if merging step 1 and 2 wouldn't simplify things. Step 1 does a recursive visit in the loop structure to create the linear range, but then if I am understanding right, step 2 recovers the nesting relationship from the linear structure.
Can't we track the nesting structure directly during the recursive visit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't true recursion here, just in inner loop walking over the same block ordering.
I'm planning on remembering the loop extent during the loop-aware RPO build so that loop body traversal code should go away, we will just know where the loop ends in the order.
We can still merge (1) and (2) since the only intervals we need to scan for overlap are the ones already created, and we create them in non-decreasing start order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is recursion in fgVisitBlocksInLoopAwareRPO. That's the recursion I meant.
Doesn't the interval structure fall naturally out from that recursive visit of the loop tree?
Maybe that's what you meant.
| unsigned ChainEnd() const | ||
| { | ||
| return m_chainEnd; | ||
| } | ||
|
|
||
| WasmInterval* Chain() | ||
| { | ||
| if (m_chain == this) | ||
| { | ||
| return this; | ||
| } | ||
|
|
||
| WasmInterval* chain = m_chain->Chain(); | ||
| m_chain = chain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike that Start/End/ChainEnd are accessors but Chain mutates. It would be nice if Chain's name communicated that it mutates the instance, even though in this case it seems like the mutation is harmless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions? GetAndUpdateChain ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
|
LGTM but I'll leave the green check to someone who knows the JIT better |

Determine how to emit Wasm control flow from the JIT's control flow graph.
Relies on loop-aware RPO to determine the block order. Currently only handles the main method. Assumes irreducible loops have been fixed upstream (which is not yet guaranteed; bails out if not so).
Doesn't actually do any emission, just prints a textual description in the JIT dump (along with a dot markup version).
Uses only LOOP and BLOCK. Tries to limit the extent of BLOCK.
Run for now as an optional phase even if not targeting Wasm, to do some stress testing.
Contributes to #121178