-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: add pass to merge common throw helper calls #27113
Conversation
@dotnet/jit-contrib PTAL jit-diffs shows this hits modestly often...
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What's the throughput impact of this optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you see regressions?
public: | ||
static bool Equals(const ThrowHelperKey& x, const ThrowHelperKey& y) | ||
{ | ||
return BasicBlock::sameEHRegion(x.m_block, y.m_block) && GenTreeCall::Equals(x.m_call, y.m_call); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware that the GenTree
equality check ignores gtCallMoreFlags
, most of gtFlags
and has at least one bug (I ran into it a few days ago - it ignores the size of GT_BLK
nodes). I'm going to fix the bug soon but it also turns out that GenTree::Compare
is not widely used in the JIT so it's possible to run into more issues when a new use is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if you were perhaps considering to use gtHashValue
to deal with the more general case - that one also has some issues and it's only used by debug functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Hadn't really looked closely at the compare code.
For the more general case where we try and merge throws (see linked issue) I wasn't relying on tree hashes.
The flow updates in this PR aren't sufficiently general yet. We need to ensure that the throw helper blocks can't be reached both by fall through and jump or else handle that case properly. For instance in
where Need to think more about how to fix flow this early in the jit without requiring full pred list info (computing even cheap preds is probably more costly than this entire phase -- I think we can instead leverage the jump target bb flag we set up early). A couple other things to sort through:
Back to the drawing board, for now. |
Hmm, seemingly no good way to fix the flow graph that early. We don't propagate branch target bits to blocks during initial flow graph construction, we don't have good discipline maintaining these flags during early edits, ref counts are not reliable early (among other things we leave excess counts for partially imported methods), and so on. So it seems like this transformation must be done later, once ref counts are reliable and we have pred lists; that cuts out some of the (admittedly small) potentential TP win, but we'll should still get the code size wins. |
Moving it later in the phase list looks promising (now have it just before Speaking of variety, one odd case that's come up is that we may decide to tail call a throw helper. My merging code doesn't currently cope with this case. When that happens we don't mark the block as rare and it doesn't get moved out of line like a called throw helper does. So we can end up with notreturn blocks in the middle of loops and such. We could probably fix that part easily enough. But aside from that, it seems likes we should either never tail call throw helpers (if we tail call, we lose info about which method may have really caused the throw) or almost always tail call them (when we can). Never tail calling them allows even more merging but also a number of regressions in methods where merging isn't possible. Hmmm... |
Running via pin on crossgen of SPC, throughput impact is basically zero:
|
Going to reopen this -- main outstanding issue is how much we can trust tree comparison. The framework throw helper calls tend to have very simple argument trees, but that may not be the case elsewhere. Added a simple heuristic to not tail call throw helpers if there's more than one such call in a method, and instead hope merging kicks in. Still need to add some test cases for odd flow patterns. Latest jit-diffs:
|
Going to bounce this to trigger retest. |
Ah, looks like I need to rebase... |
Look for blocks with single statement noreturn calls, and try to reroute flow so there's just one block call that all predecessors target. Resolves #14770. Note this impairs debuggability of optimized code a bit, as it can change which line of code apparently invokes a throw helper in a backtrace. But since we're already commoning jit-inserted throw helpers (like array index OOB) this is not breaking any new ground. We could also handle commoning BBJ_THROW blocks, with some extra effort, but prototyping indicates duplicate throws are pretty rare.
* Unify helper classes * Stack allocate map headers * Phasify * Implement an earlyout scheme
Revise this phase to run just before `optOptimizeFlow`, so that we can leverage the ref counts and predecessor lists to ensure we make correct flow updates. Don't bother trying to clean out IR, let that happen naturally from as blocks become unreferenced.
Suppress tail calling noreturn methods if there is more than one such call site in the method, hoping that instead we can merge the calls.
8f8d388
to
9d44dc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mostly questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for the explanations
Looking at
Not sure what to do about this yet. Some of the information is partially redundant so current comparisons may be sufficient (eg return type will be determined by the call target). |
Diff summary for the test case methods. Note not using tail calls for throw helpers hurts code size slightly, in some cases.
|
@BruceForstall want to take another look? There is still some concern about the potential for bugs in tree comparison (mainly false matches from insufficiently detailed compares). If we think this is a deal breaker I'll probably shelve this and try and address that separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but LGTM
if (updateCount > 0) | ||
{ | ||
assert(fgModified); | ||
fgModified = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little worrisome, but ok... Wouldn't (Shouldn't) the first code to generate "flow-dependent side data" reset this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to logic in fgCreateFuncletPrologBlocks
.
The fgModified
concept should probably be updated to just directly invalidate the dependent analyses, if any (in this case and the funclet case, there are none).
See also dotnet/runtime#35135. |
Look for blocks with single statement noreturn calls, and try to reroute
flow so there's just one block call that all predecessors target.
Resolves #14770.
Note this impairs debuggability of optimized code a bit, as it can change which
line of code apparently invokes a throw helper in a backtrace. But since we're
already commoning jit-inserted throw helpers (like array index OOB) this is not
breaking any new ground.
We could also handle commoning BBJ_THROW blocks, with some extra effort,
but prototyping indicates duplicate throws are pretty rare.