-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: handle interaction of OSR, PGO, and tail calls #62263
Conversation
When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods. And if the OSR method also has a tail call, the jit must take care to not add block probes to any return block reachable from possible tail call blocks. Instead, instrumentation should create copies of the return block probe in each return block predecessor (possibly splitting critical edges to make this viable). Because all this happens early on, there are no pred lists. The analysis leverages cheap preds instead, which means it needs to handle cases where a given pred has multiple pred list entries. And it must also be aware that the OSR method's actual flowgraph is a subgraph of the full initial graph. This came up while scouting what it would take to enable OSR by default. See dotnet#61934.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWhen both OSR and PGO are enabled, the jit will add PGO probes to OSR methods. Instead, instrumentation should create copies of the return block probe in each Because all this happens early on, there are no pred lists. The analysis leverages This came up while scouting what it would take to enable OSR by default.
|
cc @dotnet/jit-contrib. Should be no SPMI diff since it only impacts OSR + PGO, and we don't have any collections with that combination today. |
@@ -552,7 +552,8 @@ enum BasicBlockFlags : unsigned __int64 | |||
BBF_PATCHPOINT = MAKE_BBFLAG(36), // Block is a patchpoint | |||
BBF_HAS_CLASS_PROFILE = MAKE_BBFLAG(37), // BB contains a call needing a class profile | |||
BBF_PARTIAL_COMPILATION_PATCHPOINT = MAKE_BBFLAG(38), // Block is a partial compilation patchpoint | |||
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction | |||
BBF_HAS_ALIGN = MAKE_BBFLAG(39), // BB ends with 'align' instruction | |||
BBF_TAILCALL_SUCCESSOR = MAKE_BBFLAG(40), // BB has pred that has potential tail call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you meant BB has successor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No -- it marks blocks that come after tail calls.
Maybe a picture will help? Here's a fragment of an OSR method flow graph before we add instrumentation. We want to count how often R
is executed, but we can't put probes in R
because it is marked with BBF_TAILCALL_SUCCESSOR
-- it needs to remain empty since the tail call preds won't execute R
.
Also pictured are some non-tail call blocks A
and B
that conditionally share the return, and an OSR-unreachable block Z
. And the blue edge is a fall-through edge. A
has degenerate flow, which is rare, but possible.
To handle this we need to put copies of R
's probes in the tail call blocks, and create an intermediary block that all the other preds flow through to get to R
. So we end up with 3 separate copies of R's pgo probe that collectively give us the right count for R
, and R
remains empty so the tail calls work as expected.
We also take pains not to instrument Z
, since there are debug checks that verify that un-imported blocks remain empty and can be removed. And we take pains not to double-count A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah it makes sense now for me, thanks for detailed response 🙂 not going to mark it as resolved to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple comments on comments.
@@ -579,7 +579,6 @@ void Compiler::fgReplaceJumpTarget(BasicBlock* block, BasicBlock* newTarget, Bas | |||
if (jumpTab[i] == oldTarget) | |||
{ | |||
jumpTab[i] = newTarget; | |||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, this is safe and won't cause diffs. However, the header comment specifically says:
// 2. Only the first target found is updated. If there are multiple ways for a block
// to reach 'oldTarget' (e.g., multiple arms of a switch), only the first one found is changed.
so that should be updated.
One caller, fgNormalizeEHCase2() specifically expects the old behavior:
// Now change the branch. If it was a BBJ_NONE fall-through to the top block, this will
// do nothing. Since cheap preds contains dups (for switch duplicates), we will call
// this once per dup.
but it's ok, because subsequent calls will just do nothing.
Unrelated, I also note the comment says:
4. The switch table "unique successor" cache is invalidated.
Although we don't call InvalidateUniqueSwitchSuccMap()
, so that comment is a little misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let me update this documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the invalidation and updated the comments.
src/coreclr/jit/fgprofile.cpp
Outdated
// Build cheap preds. | ||
// | ||
m_comp->fgComputeCheapPreds(); | ||
m_comp->NewBasicBlockEpoch(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would EnsureBasicBlockEpoch
be sufficient? If not, it's useful to comment why. E.g., see the long comment in fgRenumberBlocks
about one versus the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters this early. This is going to be the first time we've done anything epoch related. But happy to change it (there's one other use like this nearby, for edge instrumentation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it over.
There is a subtle issue here using BlockSet
and similar this early, if you're in an inlinee compiler you need to make sure to base all these on the root compiler instance, as we share the flow graph across the two. So, added a comment and an assert that we're not in an inlinee compiler.
When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.
Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).
Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.
This came up while scouting what it would take to enable OSR by default.
See #61934.