JIT: handle interaction of OSR, PGO, and tail calls #62263

AndyAyersMS · 2021-12-02T02:37:08Z

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.

Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).

Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.

This came up while scouting what it would take to enable OSR by default.
See #61934.

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods. And if the OSR method also has a tail call, the jit must take care to not add block probes to any return block reachable from possible tail call blocks. Instead, instrumentation should create copies of the return block probe in each return block predecessor (possibly splitting critical edges to make this viable). Because all this happens early on, there are no pred lists. The analysis leverages cheap preds instead, which means it needs to handle cases where a given pred has multiple pred list entries. And it must also be aware that the OSR method's actual flowgraph is a subgraph of the full initial graph. This came up while scouting what it would take to enable OSR by default. See dotnet#61934.

ghost · 2021-12-02T02:37:16Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

When both OSR and PGO are enabled, the jit will add PGO probes to OSR methods.
And if the OSR method also has a tail call, the jit must take care to not add
block probes to any return block reachable from possible tail call blocks.

Instead, instrumentation should create copies of the return block probe in each
return block predecessor (possibly splitting critical edges to make this viable).

Because all this happens early on, there are no pred lists. The analysis leverages
cheap preds instead, which means it needs to handle cases where a given pred has
multiple pred list entries. And it must also be aware that the OSR method's actual
flowgraph is a subgraph of the full initial graph.

This came up while scouting what it would take to enable OSR by default.
See #61934.

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2021-12-02T02:38:19Z

cc @dotnet/jit-contrib.

Should be no SPMI diff since it only impacts OSR + PGO, and we don't have any collections with that combination today.

EgorBo · 2021-12-02T15:03:05Z

src/coreclr/jit/block.h

@@ -552,7 +552,8 @@ enum BasicBlockFlags : unsigned __int64
    BBF_PATCHPOINT           = MAKE_BBFLAG(36), // Block is a patchpoint
    BBF_HAS_CLASS_PROFILE    = MAKE_BBFLAG(37), // BB contains a call needing a class profile
    BBF_PARTIAL_COMPILATION_PATCHPOINT  = MAKE_BBFLAG(38), // Block is a partial compilation patchpoint
-    BBF_HAS_ALIGN          = MAKE_BBFLAG(39), // BB ends with 'align' instruction
+    BBF_HAS_ALIGN            = MAKE_BBFLAG(39), // BB ends with 'align' instruction
+    BBF_TAILCALL_SUCCESSOR   = MAKE_BBFLAG(40), // BB has pred that has potential tail call


I assume you meant BB has successor ?

No -- it marks blocks that come after tail calls.

Maybe a picture will help? Here's a fragment of an OSR method flow graph before we add instrumentation. We want to count how often R is executed, but we can't put probes in R because it is marked with BBF_TAILCALL_SUCCESSOR -- it needs to remain empty since the tail call preds won't execute R.

Also pictured are some non-tail call blocks A and B that conditionally share the return, and an OSR-unreachable block Z. And the blue edge is a fall-through edge. A has degenerate flow, which is rare, but possible.

To handle this we need to put copies of R's probes in the tail call blocks, and create an intermediary block that all the other preds flow through to get to R. So we end up with 3 separate copies of R's pgo probe that collectively give us the right count for R, and R remains empty so the tail calls work as expected.

We also take pains not to instrument Z, since there are debug checks that verify that un-imported blocks remain empty and can be removed. And we take pains not to double-count A.

Oh, yeah it makes sense now for me, thanks for detailed response 🙂 not going to mark it as resolved to keep it.

…termedary.

BruceForstall

LGTM. A couple comments on comments.

BruceForstall · 2021-12-02T18:33:58Z

src/coreclr/jit/fgbasic.cpp

@@ -579,7 +579,6 @@ void Compiler::fgReplaceJumpTarget(BasicBlock* block, BasicBlock* newTarget, Bas
                if (jumpTab[i] == oldTarget)
                {
                    jumpTab[i] = newTarget;
-                    break;


AFAICT, this is safe and won't cause diffs. However, the header comment specifically says:

// 2. Only the first target found is updated. If there are multiple ways for a block // to reach 'oldTarget' (e.g., multiple arms of a switch), only the first one found is changed.

so that should be updated.

One caller, fgNormalizeEHCase2() specifically expects the old behavior:

// Now change the branch. If it was a BBJ_NONE fall-through to the top block, this will // do nothing. Since cheap preds contains dups (for switch duplicates), we will call // this once per dup.

but it's ok, because subsequent calls will just do nothing.

Unrelated, I also note the comment says:

4. The switch table "unique successor" cache is invalidated.

Although we don't call InvalidateUniqueSwitchSuccMap(), so that comment is a little misleading.

Thanks, let me update this documentation.

Added the invalidation and updated the comments.

BruceForstall · 2021-12-02T18:37:56Z

src/coreclr/jit/fgprofile.cpp

+        // Build cheap preds.
+        //
+        m_comp->fgComputeCheapPreds();
+        m_comp->NewBasicBlockEpoch();


Would EnsureBasicBlockEpoch be sufficient? If not, it's useful to comment why. E.g., see the long comment in fgRenumberBlocks about one versus the other.

I don't think it matters this early. This is going to be the first time we've done anything epoch related. But happy to change it (there's one other use like this nearby, for edge instrumentation).

Changed it over.

There is a subtle issue here using BlockSet and similar this early, if you're in an inlinee compiler you need to make sure to base all these on the root compiler instance, as we share the flow graph across the two. So, added a comment and an assert that we're not in an inlinee compiler.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 2, 2021

EgorBo reviewed Dec 2, 2021

View reviewed changes

Be careful not to have multiple pred list entries referring to the in…

f8efe04

…termedary.

BruceForstall approved these changes Dec 2, 2021

View reviewed changes

review feedback

b111cfb

AndyAyersMS merged commit 3810633 into dotnet:main Dec 2, 2021

This was referenced Dec 2, 2021

Enable QJFL and OSR by default for x64 #61934

Closed

On Stack Replacement Next Steps #33658

Open

ghost locked as resolved and limited conversation to collaborators Jan 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: handle interaction of OSR, PGO, and tail calls #62263

JIT: handle interaction of OSR, PGO, and tail calls #62263

AndyAyersMS commented Dec 2, 2021

ghost commented Dec 2, 2021

AndyAyersMS commented Dec 2, 2021

EgorBo Dec 2, 2021

AndyAyersMS Dec 2, 2021

EgorBo Dec 2, 2021

BruceForstall left a comment

BruceForstall Dec 2, 2021

AndyAyersMS Dec 2, 2021

AndyAyersMS Dec 2, 2021

BruceForstall Dec 2, 2021

AndyAyersMS Dec 2, 2021

AndyAyersMS Dec 2, 2021

JIT: handle interaction of OSR, PGO, and tail calls #62263

JIT: handle interaction of OSR, PGO, and tail calls #62263

Conversation

AndyAyersMS commented Dec 2, 2021

ghost commented Dec 2, 2021

AndyAyersMS commented Dec 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment