Skip to content

Conversation

amanasifkhalid
Copy link
Contributor

Part of #107749. The first profile synthesis run happens before importation, so we don't model flow in and out of finally regions with flow edges yet. As a workaround, synthesis gives each finally region the same weight as its corresponding try region. When call-finally pairs are created, the tail inherits the weight of its head under the (faulty) assumption that all flow into a call-finally will return to the same pair. Once we have flow edges, we can compute the flow out of finally regions the same way as we compute flow elsewhere. It's important that synthesis models flow through finally regions via flow edges once we have them, or else flow through a loop that executes a finally might be lost, messing up the cyclic probability computation and flattening the loop's weight.

I noticed this issue after discovering that profile synthesis can disable profile consistency checking if it messed up the profile under the assumption that incorrect IL can have nonsensical flow. In such cases, synthesis will disable profile checks until the importer has run, after which the checks will be re-enabled. This quirk is specific to the pre-importation run of synthesis, so later runs can quietly disable consistency checks indefinitely, hiding bugs in synthesis (such as its inability to handle finally regions).

I want to disable this quirk for post-importation runs of synthesis, but there's one more issue with synthesis I have to resolve first: I'm seeing instances where synthesis computes cyclic probabilities close to the cap, but not quite exceeding it. Thus, synthesis doesn't flag the profile as approximate, but consistency checks find that the flow exiting a loop exceeds the flow entering it. I'm not sure if lowering the likelihood cap is a sustainable or desirable fix for this -- perhaps some more sophisticated detection of approximate consistency would be better.

@Copilot Copilot AI review requested due to automatic review settings March 28, 2025 16:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

Files not reviewed (3)
  • src/coreclr/jit/fgprofile.cpp: Language not supported
  • src/coreclr/jit/fgprofilesynthesis.cpp: Language not supported
  • src/coreclr/jit/importer.cpp: Language not supported

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 28, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@AndyAyersMS
Copy link
Member

What do we do for the likelihoods coming out of a finally? Assume each is equally likely?

It might be better to weight the likelihoods based on the weights of the associated callfinallies. Eg if there are two callfinallies, A with weight 9 and B with weight 1, the edge from the retfinally to the continuation of A should have 0.9 likelihood.

@amanasifkhalid
Copy link
Contributor Author

What do we do for the likelihoods coming out of a finally? Assume each is equally likely?

That seems to be the case:

newEdge->setLikelihood(1.0 / predCount);

Let me try your suggestion...

@amanasifkhalid
Copy link
Contributor Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Contributor Author

Aside from timeouts, I'm not seeing any failures in libraries-pgo.

@AndyAyersMS PTAL. Aside from libraries_tests, diffs aren't all that big; they seem to be mostly diffs in layout/lSRA. Thanks!

// If the block has other successors, distribute the removed edge's likelihood among the remaining successor edges.
if (succCount > 1)
{
const weight_t likelihoodIncrease = succEdge->getLikelihood() / (succCount - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these proportionally scale up?

Say there are 3 successors with likelihoods A 0.1, B 0.1, C 0.8. We remove A. Then we should have B = 0.1111..., C = 0.8888...

With these changes we'd get B = 0.15, C = 0.85, so the relative likelihood of B would increase.

Generally $p_{i,new} = p_{i,old} / (1 - p_{removed})$, unless $p_{removed}$ is 1.0 or close to 1.0, in which case equal distribution seems ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; fixed

}

// If the block has other successors, distribute the removed edge's likelihood among the remaining successor edges.
if (succCount > 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

@amanasifkhalid
Copy link
Contributor Author

/azp run runtime-coreclr libraries-pgo

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@amanasifkhalid
Copy link
Contributor Author

@AndyAyersMS re: the profile consistency issue I mentioned yesterday, it does seem to be a floating-point precision issue when computing cyclic probabilities. Here's the smallest example I could find:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight      IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       100 [000..01E)-> BB17(0.00595),BB14(0.994)   ( cond )                     i IBC hascall newarr
BB14 [0016]  2       BB01,BB34           192.52  19252 [01E..025)-> BB34(0.046),BB15(0.954) ( cond )                     i IBC bwd
BB15 [0017]  1       BB14                183.67  18367 [025..???)-> BB19(0.00334),BB27(0.997)   ( cond )                     IBC internal
BB27 [0029]  1       BB15                183.06  18306 [???..???)-> BB19(0.00334),BB28(0.997)   ( cond )                     IBC internal
BB28 [0030]  1       BB27                182.45  18245 [???..???)-> BB19(0.00334),BB29(0.997)   ( cond )                     IBC internal
BB29 [0031]  1       BB28                181.99  18199 [???..???)-> BB19(0.00251),BB31(0.997)   ( cond )                     IBC internal idxlen
BB31 [0033]  1       BB29                181.53  18153 [???..???)-> BB19(0.00251),BB32(0.997)   ( cond )                     IBC internal idxlen
BB32 [0034]  1       BB31                181.08  18108 [???..???)-> BB19(0.00251),BB18(0.997)   ( cond )                     IBC internal idxlen
BB18 [0020]  2       BB08,BB32            3931. 393088 [025..042)-> BB04(0.242),BB05(0.25),BB06(0.309),BB07(0.189),BB08(0.01)[def] (switch)                     i IBC idxlen bwd
BB04 [0004]  1       BB18                951.34  95134 [044..061)-> BB08(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB05 [0005]  1       BB18                983.50  98350 [061..07E)-> BB08(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB06 [0006]  1       BB18                 1213. 121339 [07E..09A)-> BB08(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB07 [0007]  1       BB18                743.35  74335 [09A..0B4)-> BB08(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB08 [0008]  5       BB04,BB05,BB06,BB07,BB18  3931. 393088 [0B4..0BF)-> BB18(0.954),BB34(0.046) ( cond )                     i IBC bwd
BB19 [0021]  7       BB15,BB25,BB27,BB28,BB29,BB31,BB32  69.71   6971 [025..042)-> BB24(0.242),BB23(0.25),BB22(0.309),BB21(0.189),BB25(0.01)[def] (switch)                     i IBC idxlen bwd
BB21 [0023]  1       BB19                 13.18   1318 [09A..0B4)-> BB25(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB22 [0024]  1       BB19                 21.52   2152 [07E..09A)-> BB25(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB23 [0025]  1       BB19                 17.44   1744 [061..07E)-> BB25(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB24 [0026]  1       BB19                 16.87   1687 [044..061)-> BB25(1)                 (always)                     i IBC hascall gcsafe idxlen bwd
BB25 [0027]  5       BB19,BB21,BB22,BB23,BB24  69.71   6971 [0B4..0BF)-> BB19(0.954),BB34(0.046) ( cond )                     i IBC bwd
BB34 [0036]  3       BB08,BB14,BB25      192.67  19267 [0BF..0CC)-> BB14(0.994),BB17(0.00595)   ( cond )                     i IBC bwd
BB17 [0019]  2       BB01,BB34             1.15    115 [0CC..0D3)                           (return)                     i IBC
BB35 [0037]  0                             0         0 [???..???)                           (throw )                     i IBC rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

At some point, we gained some extra weight in the loop for which BB34 exits, because the method exit weight (115) doesn't match the entry weight (100). Here's the cyclic probability computation for the outer loop:

ccp: BB14 :: 1.0 (header)
ccp: BB15 :: 0.95405
ccp: BB27 :: 0.9508592
ccp: BB28 :: 0.947679
ccp: BB29 :: 0.9453009
ccp: BB31 :: 0.9429287
ccp: BB32 :: 0.9405625
ccp: BB19 :: 0.3621144 (nested header)
ccp: BB21 :: 0.0684773
ccp: BB22 :: 0.1117778
ccp: BB23 :: 0.09060025
ccp: BB24 :: 0.08763792
ccp: BB25 :: 0.3621144
ccp: BB18 :: 20.41789 (nested header)
ccp: BB07 :: 3.861107
ccp: BB06 :: 6.302613
ccp: BB05 :: 5.108514
ccp: BB04 :: 4.941482
ccp: BB08 :: 20.41789
ccp: BB34 :: 1.000791

All three loops have one test block each, and there isn't any EH flow to contend with, so I expect the exit block BB34's weight to match the weight of the header BB14, but instead the former slightly exceeds the latter. This issue hits for only a few contexts in benchmarks.run_pgo, and each case involves loop cloning introducing conditions with edge likelihoods with a lot of significant figures.

I tweaked the entry/exit residual check to detect failed convergence for synthesis runs after importation (enabling this check beforehand causes diffs, as we'll blend likelihoods and resynthesize more often). This feels like a quirk: Another option is to mimic the method entry/exit consistency check for each loop body: If the loop doesn't have EH flow, then we'd expect its entry flow to match its exit flow. There's nothing stopping this imprecision from cropping up for loop bodies with EH though, so neither option seems robust.

@AndyAyersMS
Copy link
Member

Attach the full log (at least the synthesis related part) if you get a chance.

I wonder if we have some block whose likelihoods doesn't sum to 1.0, and this leads to the problem (though perhaps not, you would think this would lead us to underestimate the flow out of a loop)?

In fgDebugCheckOutgoingProfileData we allow some tolerance here; I wonder what happens if we insist the outgoing likelihood exactly sum to 1.0.

@amanasifkhalid
Copy link
Contributor Author

Here's the JitDump:
dump.txt

In fgDebugCheckOutgoingProfileData we allow some tolerance here; I wonder what happens if we insist the outgoing likelihood exactly sum to 1.0.

Tightening this invariant didn't reveal anything.

@AndyAyersMS
Copy link
Member

Right, we should see BB34 be exactly 1.0 here, not 1.000791, since there is a single loop exit. But the backedge drops below 1.0 so we don't notice.

I can't tell where things go wrong from the dump, perhaps because of rounding issues in the displayed values. BB18 is only reached via a chain of jumps, and BB19 has many preds, so likely one or both of those weights are slightly too high.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part.


// Recompute the likelihoods of the block's other successor edges.
const weight_t removedLikelihood = succEdge->getLikelihood();
for (unsigned i = 0; (removedLikelihood != 1.0) && (i < (succCount - 1)); i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to handle the case where removed likelihood is 1.0, don't we (probably rare)?

In that case we should just spread the likelihood equally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed


// Recompute the likelihoods of the block's other successor edges.
const weight_t removedLikelihood = succEdge->getLikelihood();
for (unsigned i = 0; (removedLikelihood != 1.0) && (i < (succCount - 1)); i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here.

// If we removed all of the flow out of 'block', distribute flow among the remaining edges evenly.
const weight_t currLikelihood = succTab[i]->getLikelihood();
const weight_t newLikelihood = currLikelihood / (1.0 - removedLikelihood);
const weight_t newLikelihood =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we 're removing all the likelihood then currLikelihood for each survivor will be 0.

This needs to be 1 / succCount if removedLikelihood == 1.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I blanked on this -- fixed

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this needs a bit more revision.

@amanasifkhalid
Copy link
Contributor Author

/ba-g blocked by timeouts

@amanasifkhalid amanasifkhalid merged commit a79768e into dotnet:main Apr 3, 2025
106 of 110 checks passed
@amanasifkhalid amanasifkhalid deleted the fix-profile-synthesis branch April 3, 2025 14:38
@github-actions github-actions bot locked and limited conversation to collaborators May 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants