JIT: more unexpected dynamic PGO schema mismatches #85856

AndyAyersMS · 2023-05-06T01:26:05Z

I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph and so throw away perfectly good PGO data.

But that's not the case. The initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):

runtime/src/coreclr/jit/fgbasic.cpp

Lines 2975 to 2980 in e61e022

    
           jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr); 
        
           if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S)) 
        
           { 
        
               continue; /* NOP */ 
        
           }

So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:

;; compiling as root

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..007)-> BB03 ( cond )                     
BB02 [0001]  1       BB01                  1       [007..016)-> BB04 (always)                     
BB03 [0002]  1       BB01                  1       [016..017)                                     
BB04 [0003]  2       BB02,BB03             1       [017..039)-> BB05 (always)                     
BB05 [0004]  1       BB04                  1       [039..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation

New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)

;; as inlinee

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102]  1                           100       [000..007)-> BB03 ( cond )                     
BB02 [0103]  1       BB01                100       [007..016)-> BB04 (always)                     
BB03 [0104]  1       BB01                100       [016..017)                                     
BB04 [0105]  2       BB02,BB03           100       [017..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:

We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.

Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.

Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).

The text was updated successfully, but these errors were encountered:

ghost · 2023-05-06T01:26:27Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

I had hoped that #85805 would have fixed all the cases where we have dynamic PGO data available but can't match up the schema with the current flow graph.

But that's not the case. Another problem is that the initial flow graph for a method may be slightly different, depending on whether or not the method is an inlinee. One culprit is this bit of code (there may be more I haven't spotted yet):

runtime/src/coreclr/jit/fgbasic.cpp

Lines 2975 to 2980 in e61e022

    
           jmpDist = (sz == 1) ? getI1LittleEndian(codeAddr) : getI4LittleEndian(codeAddr); 
        
           if (compIsForInlining() && jmpDist == 0 && (opcode == CEE_BR || opcode == CEE_BR_S)) 
        
           { 
        
               continue; /* NOP */ 
        
           }

So for inlinees only, we will not break blocks because of jump to next. This causes divergence like the following:

;; compiling as root

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..007)-> BB03 ( cond )                     
BB02 [0001]  1       BB01                  1       [007..016)-> BB04 (always)                     
BB03 [0002]  1       BB01                  1       [016..017)                                     
BB04 [0003]  2       BB02,BB03             1       [017..039)-> BB05 (always)                     
BB05 [0004]  1       BB04                  1       [039..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

Using edge profiling

EfficientEdgeCountInstrumentor: preparing for instrumentation

New BlockSet epoch 1, # of blocks (including unused BB00): 6, bitset array size: 1 (short)
[0] New probe for BB05 -> BB01 [source]
[1] New probe for BB03 -> BB04 [source]
5 blocks, 2 probes (0 on critical edges)

;; as inlinee

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0102]  1                           100       [000..007)-> BB03 ( cond )                     
BB02 [0103]  1       BB01                100       [007..016)-> BB04 (always)                     
BB03 [0104]  1       BB01                100       [016..017)                                     
BB04 [0105]  2       BB02,BB03           100       [017..06A)        (return)                     
-----------------------------------------------------------------------------------------------------------------------------------------

*************** Inline @[000773] Starting PHASE Profile incorporation
Have Dynamic PGO: 2 schema records (schema at 000001F14A67CC78, data at 000001F14A63F8E8)
Profile summary: 1 runs, 0 block probes, 2 edge probes, 0 class profiles, 0 method profiles, 0 other records

Reconstructing block counts from sparse edge instrumentation
... adding known edge BB03 -> BB04: weight 65208
Could not find source block for schema entry 1 (IL offset/key 00000039)
... not solving because of the mismatch
... discarding profile count data: PGO data available, but IL did not match
Computing inlinee profile scale:

We have been relying on the fact that if we build our instrumentation plan super early we always get the same flow graph.

Seems like the pragmatic thing to do is to always do this optimization if we're optimizing or instrumenting, and not just for inlinees. There is matching logic in the importer and perhaps elsewhere so this should probably be encapsulated into a helper.

Doing that will (temporarily) break the ability to ingest static PGO for (hopefully just a few) root methods that have branch to next like this. Hopefully not too many of them. And SPMI will have a number of diffs as well, both cases where we no longer read PGO data for root methods and cases where we do read PGO data for inlinees).

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`, `needs-area-label`
Milestone:	-

…e data Always try and merge "branch to next" blocks when building the intial flow graph if BBINSTR or BBOPT is set. Fixes dotnet#85856.

…e data (#85860) Always try and merge "branch to next" blocks when building the intial flow graph if BBINSTR or BBOPT is set. Fixes #85856.

AndyAyersMS · 2023-05-06T15:34:55Z

Recollecting ASP.NET now that #85960 is in -- will see if I can get through that collection without hitting more cases.

AndyAyersMS · 2023-05-06T18:26:28Z

Still seeing cases of this -- when BBINSTR_IF_LOOPS TIER0 is passed by the VM, the jit may build the flow graph first and then decide to instrument later, and so we end up with the same kind of mismatch as noted above. Fix is to further generalize the condition and also do the block fusion for Tier0 jitting.

The JIT will sometimes decide to instrument a Tier0 method even if `BBINSTR` is not passed by the VM (this is enabled when the VM passes `BBINSTR_IF_LOOPS` so that we can provide some PGO data to OSR methods). In such cases we build the flow graph and then decide to instrument, so the flow graph shape may differ from the case where we know up front that we are going to instrument or optimize. Remedy this by also running the early branch to next flow graph opt when a Tier0 JIT is passed `BBINSTR_IF_LOOPS`. Addresses another case of dotnet#85856.

The JIT will sometimes decide to instrument a Tier0 method even if `BBINSTR` is not passed by the VM (this is enabled when the VM passes `BBINSTR_IF_LOOPS` so that we can provide some PGO data to OSR methods). In such cases we build the flow graph and then decide to instrument, so the flow graph shape may differ from the case where we know up front that we are going to instrument or optimize. Remedy this by also running the early branch to next flow graph opt when a Tier0 JIT is passed `BBINSTR_IF_LOOPS`. Addresses another case of #85856.

AndyAyersMS · 2023-05-07T15:06:48Z

I recollected ASP.NET after #85873 and don't see any more mismatches. Will wait for the normal collection (which kicks off today) to refresh the others and then see if they're all clean too. If so then I'll put up the PR to turn on assertion checking for mismatches.

AndyAyersMS · 2023-05-08T15:05:25Z

Enabled the assert in #85898 so think we're in good shape now.

ghost added the untriaged New issue has not been triaged by the area owner label May 6, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 6, 2023

AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2023

AndyAyersMS self-assigned this May 6, 2023

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label May 6, 2023

AndyAyersMS added this to the 8.0.0 milestone May 6, 2023

AndyAyersMS mentioned this issue May 6, 2023

JIT: produce consistent flow graph when producing or consuming profile data #85860

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label May 6, 2023

AndyAyersMS closed this as completed in #85860 May 6, 2023

AndyAyersMS added a commit that referenced this issue May 6, 2023

JIT: produce consistent flow graph when producing or consuming profil…

ae1be12

…e data (#85860) Always try and merge "branch to next" blocks when building the intial flow graph if BBINSTR or BBOPT is set. Fixes #85856.

ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 6, 2023

AndyAyersMS reopened this May 6, 2023

AndyAyersMS mentioned this issue May 6, 2023

JIT: fix another case of early flow graph divergence #85873

Merged

vcsjones removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 6, 2023

AndyAyersMS closed this as completed May 8, 2023

ghost locked as resolved and limited conversation to collaborators Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: more unexpected dynamic PGO schema mismatches #85856

JIT: more unexpected dynamic PGO schema mismatches #85856

AndyAyersMS commented May 6, 2023 •

edited

Loading

ghost commented May 6, 2023

AndyAyersMS commented May 6, 2023

AndyAyersMS commented May 6, 2023

AndyAyersMS commented May 7, 2023

AndyAyersMS commented May 8, 2023

JIT: more unexpected dynamic PGO schema mismatches #85856

JIT: more unexpected dynamic PGO schema mismatches #85856

Comments

AndyAyersMS commented May 6, 2023 • edited Loading

ghost commented May 6, 2023

AndyAyersMS commented May 6, 2023

AndyAyersMS commented May 6, 2023

AndyAyersMS commented May 7, 2023

AndyAyersMS commented May 8, 2023

AndyAyersMS commented May 6, 2023 •

edited

Loading