JIT: resolve issues with OSR and PGO #47942

AndyAyersMS · 2021-02-06T01:45:22Z

OSR and PGO don't play nicely together.

See #46882 (comment) and follow on comments.

AndyAyersMS · 2021-03-30T01:21:33Z

There's a workaround in place (use block profiling with OSR) that's good enough for now.

So, moving to Future.

AndyAyersMS · 2021-10-05T02:48:57Z

The interaction of OSR and PGO is a bit more involved than I'd been thinking. There are a few sub-problems:

The original method's PGO data will likely be incomplete. In particular there may not be any profile data "after" the patchpoint so large portions of the method may appear cold when the OSR method is being jitted.
With sparse profiles we may even lose the ability to infer counts in the parts of the original method that were profiled, unless we model the potential impact of the OSR flow.
This problem will persist if we ever rejit the method at Tier1, since any execution of the original method that would have accumulated profile data will be redirected to the OSR method.
With sparse instrumentation the OSR method's flow graph needs to match the original method, initially, so we can determine the right placement of counts.

It seems plausible that we can fix 2, 3, and 4 by modelling OSR flow in the original method, deferring OSR flow updates until after the profile model has been built (which is happening as part of #59784) and then adding instrumentation to the root part of the OSR method (this is a bit novel, but seems doable, we just use the schema binding we get from reading the PGO data in the OSR method to emit "redundant" probes in the OSR method itself). The downside here is that the OSR method has some extra overhead, but maybe that's acceptable.

But fixing 1 seems hard...unless we have the OSR method return back to the original method , once we've exited the part of the OSR method flowgraph that is post-dominated by the patchpoint. Note there could be many such points, and each one might need a custom "fixup" action to restore the IL state in the original method from the live state in the OSR method; plus we might need to do lifetime extensions to ensure that we don't dead code something in the OSR method that's still needed back in the original.

AndyAyersMS · 2021-11-09T22:01:03Z

Rough plan of attack.

Keep using "dense" profiles for Tier0 methods with patchpoints.
Don't use dynamic PGO data to optimize OSR methods (static PGO ok).
Add ability to fetch schema and live data from VM. OSR method does this, verifies schema matches, and then instruments the parts of the code it will generate. Thus both the Tier0 and OSR methods will contribute to overall profile production.

So OSR methods won't be competitive with their Tier1 counterparts; this should be OK. But they should be much faster than Tier0 and perhaps that's good enough.

AndyAyersMS · 2021-11-10T19:29:39Z

Working on the abovem ran into a complication. It's not possible to get the OSR method to produce the exact same schema as the Tier0 method.

They produce the same flow graph initially (since it's based on method IL). But they do different selective importation, and currently block and class instrumentation is gated on this:

runtime/src/coreclr/jit/fgprofile.cpp

Lines 346 to 349 in 1d352fc

    
           bool ShouldProcess(BasicBlock* block) override 
        
           { 
        
               return ((block->bbFlags & (BBF_INTERNAL | BBF_IMPORTED)) == BBF_IMPORTED); 
        
           }

runtime/src/coreclr/jit/fgprofile.cpp

Lines 1427 to 1430 in 1d352fc

    
           bool ShouldProcess(BasicBlock* block) override 
        
           { 
        
               return ((block->bbFlags & (BBF_INTERNAL | BBF_IMPORTED)) == BBF_IMPORTED); 
        
           }

We could fix this easily enough for counts by "over allocating" the schema for both Tier0 and OSR (basically pretending like we're going to import the entire method) because count probes don't depend on the IR in the block. The downside here is using more schema/data memory then necessary.

But for class probes the probe structure depends on the IR so if we import differently we can't hope to match class probe schema entries.

Some alternatives to explore:

update the runtime to recognize when one schema (the one we propose to use for the OSR method) is a subset of another (the Tier0 method), and properly update the offset info for the subset based on the superset. This should not be too hard, provided the runtime doesn't reorder the schema entries that the jit creates at Tier0.
do something similar, but on the jit side -- fetch the Tier0 schema/data and then verify each place we'd put a probe is already covered.

AndyAyersMS · 2021-12-20T17:29:59Z

One other issue:

we may lose some PGO coverage for inlinees as we're not adding probes for them to the OSR method

Enable edge based profiles for OSR, partial compilation, and optimized plus instrumented cases. For OSR this requires deferring flow graph modifications until after we have built the initial probe list, so that the initial list reflects the entirety of the method. This set of candidate edge probes is thus the same no matter how the method is compiled. A given compile may schematize a subset of these probes and materialize a subset of what gets schematized; this is tolerated by the PGO mechanism provided that the initial instrumented jitting produces a schema which is a superset of the schema produced by any subsequent instrumented rejitting. This is normally the case. Partial compilation may still need some work to ensure full schematization but it is currently off by default. Will address this subsequently. For optimized compiles we give the EfficientEdgeCountInstrumentor the same kind of probe relocation abilities that we have in the BlockCountInstrumentor. In particular we need to move probes that might appear in return blocks that follow implicit tail call blocks, since those return blocks must remain empty. The details on how we do this are a bit different but the idea is the same: we create duplicate copies of any probe that was going to appear in the return block and instead instrument each pred. If the pred reached the return via a critical edge, we split the edge and put the probe there. This analysis relies on cheap preds, so to ensure we can use them we move all the critial edge splitting so it happens before we need the cheap pred lists. The ability to do block profiling is retained but will no longer be used without special config settings. There were also a few bug fixes in the spanning tree visitor. It must visit a superset of the blocks we end up importing and was missing visits in some cases. This should improve jit time and code quality for instrumented code. Fixes dotnet#47942. Fixes dotnet#66101. Contributes to dotnet#74873.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 6, 2021

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Feb 6, 2021

AndyAyersMS added this to the 6.0.0 milestone Feb 6, 2021

This was referenced Feb 6, 2021

Dynamic PGO #43618

Closed

JIT: efficient profiling schemes #46882

Closed

JulieLeeMSFT assigned AndyAyersMS Mar 23, 2021

AndyAyersMS modified the milestones: 6.0.0, Future Mar 30, 2021

AndyAyersMS mentioned this issue Oct 5, 2021

On Stack Replacement Next Steps #33658

Open

72 tasks

AndyAyersMS mentioned this issue Nov 11, 2021

Update how OSR and PGO interact #61453

Merged

AndyAyersMS mentioned this issue Jun 10, 2022

Proposal: Public PGO config switch #70410

Closed

AndyAyersMS mentioned this issue Oct 14, 2022

PGO work planned for .NET 8 #74873

Closed

44 tasks

AndyAyersMS mentioned this issue Jan 11, 2023

JIT: Enable edge based profiles for all scenarios #80481

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 11, 2023

AndyAyersMS closed this as completed in #80481 Jan 13, 2023

AndyAyersMS closed this as completed in eb76787 Jan 13, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 13, 2023

ghost locked as resolved and limited conversation to collaborators Feb 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: resolve issues with OSR and PGO #47942

JIT: resolve issues with OSR and PGO #47942

AndyAyersMS commented Feb 6, 2021

AndyAyersMS commented Mar 30, 2021

AndyAyersMS commented Oct 5, 2021

AndyAyersMS commented Nov 9, 2021

AndyAyersMS commented Nov 10, 2021

AndyAyersMS commented Dec 20, 2021

JIT: resolve issues with OSR and PGO #47942

JIT: resolve issues with OSR and PGO #47942

Comments

AndyAyersMS commented Feb 6, 2021

AndyAyersMS commented Mar 30, 2021

AndyAyersMS commented Oct 5, 2021

AndyAyersMS commented Nov 9, 2021

AndyAyersMS commented Nov 10, 2021

AndyAyersMS commented Dec 20, 2021