Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: more unexpected dynamic PGO schema mismatches #85856

Closed
AndyAyersMS opened this issue May 6, 2023 · 5 comments · Fixed by #85860
Closed

JIT: more unexpected dynamic PGO schema mismatches #85856

AndyAyersMS opened this issue May 6, 2023 · 5 comments · Fixed by #85860
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented May 6, 2023

I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph and so throw away perfectly good PGO data.

But that's not the case. The initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):

jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr);
if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S))
{
continue; /* NOP */
}

So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:

;; compiling as root

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..007)-> BB03 ( cond )                     
BB02 [0001]  1       BB01                  1       [007..016)-> BB04 (always)                     
BB03 [0002]  1       BB01                  1       [016..017)                                     
BB04 [0003]  2       BB02,BB03             1       [017..039)-> BB05 (always)                     
BB05 [0004]  1       BB04                  1       [039..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation

New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)

;; as inlinee

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102]  1                           100       [000..007)-> BB03 ( cond )                     
BB02 [0103]  1       BB01                100       [007..016)-> BB04 (always)                     
BB03 [0104]  1       BB01                100       [016..017)                                     
BB04 [0105]  2       BB02,BB03           100       [017..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:

We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.

Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.

Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).

@ghost ghost added the untriaged New issue has not been triaged by the area owner label May 6, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 6, 2023
@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2023
@AndyAyersMS AndyAyersMS self-assigned this May 6, 2023
@ghost
Copy link

ghost commented May 6, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph.

But that's not the case. Another problem is that the initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):

jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr);
if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S))
{
continue; /* NOP */
}

So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:

;; compiling as root

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..007)-> BB03 ( cond )                     
BB02 [0001]  1       BB01                  1       [007..016)-> BB04 (always)                     
BB03 [0002]  1       BB01                  1       [016..017)                                     
BB04 [0003]  2       BB02,BB03             1       [017..039)-> BB05 (always)                     
BB05 [0004]  1       BB04                  1       [039..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation

New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)

;; as inlinee

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102]  1                           100       [000..007)-> BB03 ( cond )                     
BB02 [0103]  1       BB01                100       [007..016)-> BB04 (always)                     
BB03 [0104]  1       BB01                100       [016..017)                                     
BB04 [0105]  2       BB02,BB03           100       [017..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:

We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.

Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.

Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged, needs-area-label

Milestone: -

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label May 6, 2023
@AndyAyersMS AndyAyersMS added this to the 8.0.0 milestone May 6, 2023
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 6, 2023
…e data

Always try and merge "branch to next" blocks when building the intial flow graph
if BBINSTR or BBOPT is set.

Fixes dotnet#85856.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 6, 2023
AndyAyersMS added a commit that referenced this issue May 6, 2023
…e data (#85860)

Always try and merge "branch to next" blocks when building the intial flow graph
if BBINSTR or BBOPT is set.

Fixes #85856.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 6, 2023
@AndyAyersMS
Copy link
Member Author

Recollecting ASP.NET now that #85960 is in -- will see if I can get through that collection without hitting more cases.

@AndyAyersMS
Copy link
Member Author

Still seeing cases of this -- when BBINSTR_IF_LOOPS TIER0 is passed by the VM, the jit may build the flow graph first and then decide to instrument later, and so we end up with the same kind of mismatch as noted above. Fix is to further generalize the condition and also do the block fusion for Tier0 jitting.

@AndyAyersMS AndyAyersMS reopened this May 6, 2023
AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 6, 2023
The JIT will sometimes decide to instrument a Tier0 method even if `BBINSTR`
is not passed by the VM (this is enabled when the VM passes `BBINSTR_IF_LOOPS`
so that we can provide some PGO data to OSR methods).

In such cases we build the flow graph and then decide to instrument, so the
flow graph shape may differ from the case where we know up front that we are going
to instrument or optimize.

Remedy this by also running the early branch to next flow graph opt when a Tier0
JIT is passed `BBINSTR_IF_LOOPS`.

Addresses another case of dotnet#85856.
@vcsjones vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 6, 2023
AndyAyersMS added a commit that referenced this issue May 7, 2023
The JIT will sometimes decide to instrument a Tier0 method even if `BBINSTR`
is not passed by the VM (this is enabled when the VM passes `BBINSTR_IF_LOOPS`
so that we can provide some PGO data to OSR methods).

In such cases we build the flow graph and then decide to instrument, so the
flow graph shape may differ from the case where we know up front that we are going
to instrument or optimize.

Remedy this by also running the early branch to next flow graph opt when a Tier0
JIT is passed `BBINSTR_IF_LOOPS`.

Addresses another case of #85856.
@AndyAyersMS
Copy link
Member Author

I recollected ASP.NET after #85873 and don't see any more mismatches. Will wait for the normal collection (which kicks off today) to refresh the others and then see if they're all clean too. If so then I'll put up the PR to turn on assertion checking for mismatches.

@AndyAyersMS
Copy link
Member Author

Enabled the assert in #85898 so think we're in good shape now.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants