-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: use synthesis to repair some reconstruction issues #84312
Conversation
In particular, run synthesis in repair mode for cases where there are profile counts within the method but zero counts in `fgFirstBB`. Recall that sparse profiling effectively probes return blocks to determine the method entry count. So the zero-entry but not zero-everywhere case can happen if we have a method with a very long running loop plus sparse profiling plus OSR -- we will only get profile counts from the instrumented Tier0 method, and it will never return (instead it will always escape to an OSR version which will eventually return, but that version won't be instrumented). I originally was a bit more ambitious and ran repair for a broader set of reconstruction issues, but lead to a large number of diffs, in part because repair doesn't cope well with irreducible loops. Leaving the entry count zero can have fairly disastrous impact on the quality of optimizations done in the method. Addresses quite a few of the worst-performing benchmarks in dotnet#84264.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsIn particular, run synthesis in repair mode for cases where there are profile counts within the method but zero counts in Recall that sparse profiling effectively probes return blocks to determine the method entry count. So the zero-entry but not zero-everywhere case can happen if we have a method with a very long running loop plus sparse profiling plus OSR -- we will only get profile counts from the instrumented Tier0 method, and it will never return (instead it will always escape to an OSR version which will eventually return, but that version won't be instrumented). I originally was a bit more ambitious and ran repair for a broader set of reconstruction issues, but lead to a large number of diffs, in part because repair doesn't cope well with irreducible loops. Leaving the entry count zero can have fairly disastrous impact on the quality of optimizations done in the method. Addresses quite a few of the worst-performing benchmarks in #84264.
|
Handful of diffs expected. If we had a PGO benchmarks collection we'd see quite a few more. @EgorBo PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Is it possible today to detect stale static PGO data and should we still use it or discard completely?
- Should we enable Dynamic PGO for coreclr_tests.run collection?
If the flowgraph for the method is quite different (that is, if we fail to find an edge in the flowgraph based on IL offsets from schema entries) we will hit the This can mean that some trivial/harmless edits to methods will lead to us tossing usable data. It is possible (though not easy) to build approximate matching algorithms that try and recognize when the graphs have the same shape but not the same identifying marks and use that to propagate the stale data. And I suppose do something simialr for class profiles if their IL offsets shift a bit. But not sure it's worth the trouble.
Yes, we need to enable more PGO driven collections. Benchmarks would be good to have too. |
In particular, run synthesis in repair mode for cases where there are profile counts within the method but zero counts in
fgFirstBB
.Recall that sparse profiling effectively probes return blocks to determine the method entry count.
So the zero-entry but not zero-everywhere case can happen if we have a method with a very long running loop plus sparse profiling plus OSR -- we will only get profile counts from the instrumented Tier0 method, and it will never return (instead it will always escape to an OSR version which will eventually return, but that version won't be instrumented).
I originally was a bit more ambitious and ran repair for a broader set of reconstruction issues, but lead to a large number of diffs, in part because repair doesn't cope well with irreducible loops.
Leaving the entry count zero can have fairly disastrous impact on the quality of optimizations done in the method.
Addresses quite a few of the worst-performing benchmarks in #84264.