-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile Synthesis Work Items #82964
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis issue captures work items and other notes related to profile synthesis in RyuJit. OverviewProfile synthesis is an algorithm to estimate plausible block and edge profile data. It can be used to guide optimizations in the absence of true profile data, fill in gaps in profile data, or repair damaged (inconsistent data). See the section on synthesis in the main Dynamic PGO for more context. Proposed work
|
@VSadov who is the best person to ask these days about async method IL generation in Roslyn? I am wondering if the irreducible flow I am seeing (like in the example just above) is something that could easily be fixed with a bit of code duplication. As is, the jit will not recognize the loops in these methods as loops and so a lot of the classic loop optimizations (say hoisting) won't fire. It may not matter in practice as these methods are tough to optimize anyways and maybe not that CPU intensive. But perhaps worth a closer look. One crude way to view this is that if there were a source gen option for the async state machines, and that code spit had to resort to |
Add the ability to verify that the profile counts produced by synthesis are self-consistent, and optionally to assert if they're not. Disable checking if we know profile flow will be inconsistent (because of irreducible loops and/or capped cyclic probabilities). Consistently ignore the likely flow out of handlers. Generally we model handlers as isolated subgraphs that do not contribute flow to the main flow graph. This is generally acceptable. The one caveat is for flow into finallies. The plan here is to fix the weights for finallies up in a subsequent pass via simple scaling once callfinallies are introduced. Flow weights out of finallies will be ignored as the callfinally block will be modeled as always flowing to its pair tail. Also add the ability to propagate the synthesized counts into the live profile data. This is mainly to facilitate using the MIBC comparison tools we have to assess how closely the synthesiszed data comes to actual PGO data. Finally, enable the new synthesized plus checked modes in a few of our PGO pipelines. Contributes to dotnet#82964.
@AndyAyersMS Is the reference to #82753 incorrect? |
Add the ability to verify that the profile counts produced by synthesis are self-consistent, and optionally to assert if they're not. Disable checking if we know profile flow will be inconsistent (because of irreducible loops and/or capped cyclic probabilities). Consistently ignore the likely flow out of handlers. Generally we model handlers as isolated subgraphs that do not contribute flow to the main flow graph. This is generally acceptable. The one caveat is for flow into finallies. The plan here is to fix the weights for finallies up in a subsequent pass via simple scaling once callfinallies are introduced. Flow weights out of finallies will be ignored as the callfinally block will be modeled as always flowing to its pair tail. Also add the ability to propagate the synthesized counts into the live profile data. This is mainly to facilitate using the MIBC comparison tools we have to assess how closely the synthesiszed data comes to actual PGO data. Finally, enable the new synthesized plus checked modes in a few of our PGO pipelines. Contributes to #82964.
Thanks -- fixed that above: the right issue is #82755. |
…ropagation Once synthesis arrives on the scene, we're not going to want phases in the JIT to arbitrarily modifing block weights. There is already a guard of this sort for regular profile data, so it makes sense to extend that to synthesized data as well. When synthesizing counts, propagate counts to finallies from the associated trys. This needs to be done carefully as we have make sure not to visit the finally until the count in the try is set. We rely on some of the properties of DFS pre and post number bracketing to do this efficiently, without needing to track extra state. Contributes to dotnet#82964.
…83185) Once synthesis arrives on the scene, we're not going to want phases in the JIT to arbitrarily modifying block weights. There is already a guard of this sort for regular profile data, so it makes sense to extend that to synthesized data as well. When synthesizing counts, propagate counts to finallies from the associated trys. This needs to be done carefully as we have make sure not to visit the finally until the count in the try is set. We rely on some of the properties of DFS pre and post number bracketing to do this efficiently, without needing to track extra state. Contributes to #82964.
Implement blend and repair modes for synthesis. Blend merges a bit of synthesized data into an existing PGO data set; repair tries to fix any local inconsistencies (via heuristics). Both run count construction afterwards. Trust blended data like we trust dynamic data. Probably will want more nuance here (eg trust dynamic blend, but not static blend) but this is sufficent for now. Also implement random and reverse modes; these will ultimately be used for stress testing (not called anywhere yet). Parameterize some of the magic constants that have cropped up. Add blend mode as a new weekend pgo stress mode; fix the other synthesis mode I added recently to pgo stress to set the config properly. Contributes to dotnet#82964.
Implement blend and repair modes for synthesis. Blend merges a bit of synthesized data into an existing PGO data set; repair tries to fix any local inconsistencies (via heuristics). Both run count construction afterwards. Trust blended data like we trust dynamic data. Probably will want more nuance here (eg trust dynamic blend, but not static blend) but this is sufficent for now. Also implement random and reverse modes; these will ultimately be used for stress testing (not called anywhere yet). Parameterize some of the magic constants that have cropped up. Add blend mode as a new weekend pgo stress mode; fix the other synthesis mode I added recently to pgo stress to set the config properly. Contributes to #82964.
Start numbering inlinee blocks from 1 instead of 1 + the root compiler's max BB num. Update inlinee block bbNums when they are inserted into the root compiler's graph. Adjust computations in various places that knew about the old approach and looked from inlinee compiler to root compiler for bitset, epochs and the like. Enable synthesis for inlinees, now that regular bitsets on inlinee compiler instances behave sensibly. There is still some messiness around inlinees inheriting root compiler EH info which requires special checks. I will clean this up separately. Fixes dotnet#82755. Contributes to dotnet#82964. enable synthesis for inlinees
…83610) Start numbering inlinee blocks from 1 instead of 1 + the root compiler's max BB num. Update inlinee block bbNums when they are inserted into the root compiler's graph. Adjust computations in various places that knew about the old approach and looked from inlinee compiler to root compiler for bitset, epochs and the like. Enable synthesis for inlinees, now that regular bitsets on inlinee compiler instances behave sensibly. There is still some messiness around inlinees inheriting root compiler EH info which requires special checks. I will clean this up separately. Fixes #82755. Contributes to #82964.
Roslyn issue was closed as a dup of dotnet/roslyn#39814 |
At this point we've wrapped up all the planned work on Profile Synthesis for .NET 8. So closing this; we'll revisit some of the open items here in the next release's planning. |
This issue captures work items and other notes related to profile synthesis in RyuJit.
Overview
Profile synthesis is an algorithm to estimate plausible block and edge profile data. It can be used to guide optimizations in the absence of true profile data, fill in gaps in profile data, or repair damaged (inconsistent data).
See the section on synthesis in the main Dynamic PGO for more context.
Completed for .NET 8
Opportunistic work for .NET 8
fgGetPredForBlock
-- we often want to go from a block to the successor edgesSome initial data in JIT: profile synthesis consistency checking and more #83068.
fgFirstBB
being specialNote a number of the above were handled in .NET 9, see #93020
The text was updated successfully, but these errors were encountered: