-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pgo phase3 #47558
Pgo phase3 #47558
Conversation
Loop implemented but still chasing bugs Handle volatility issues End to end type handle processing working R2R dump support for pgo instrumentation data
- Fix type histogram processing to actually properly handling unknown type handles in the histogram
…edded in the R2R file
src/coreclr/vm/readytoruninfo.cpp
Outdated
@@ -701,6 +701,22 @@ ReadyToRunInfo::ReadyToRunInfo(Module * pModule, PEImageLayout * pLayout, READYT | |||
m_availableTypesHashtable = NativeHashtable(parser); | |||
} | |||
|
|||
// For format version 4.1 and later, there is an optional table of instrumentation data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// For format version 4.1 and later, there is an optional table of instrumentation data | |
// For format version 5.2 and later, there is an optional table of instrumentation data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over the jit interface and pgo manager changes.
This should address #13672, right?
definedSymbols: new ISymbolDefinitionNode[] { this }); | ||
} | ||
|
||
public override int ClassCode => 1887299452; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some process by which these values are chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a random number, it simply needs to not match one of the other ClassCode's, and having a bunch of entropy in the number makes a few things faster. This is (poorly) documented where the base ClassCode property is defined.
#endif // DACCESS_COMPILE | ||
|
||
HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, SArray<ICorJitInfo::PgoInstrumentationSchema>* pSchema, BYTE**pInstrumentationData) | ||
HRESULT PgoManager::getPgoInstrumentationResultsInstance(MethodDesc* pMD, BYTE** pAllocatedData, ICorJitInfo::PgoInstrumentationSchema** ppSchema, UINT32 *pCountSchemaItems, BYTE**pInstrumentationData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this prefer "live" data over R2R data?
For instance, will R2R assemblies loaded under ZapDisable lookup embedded profile data for their methods even though they may have gone through Tier0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, sure. I feel that ideally we would merge the data, but I don't really want to write that code right now, especially as it would be a duplication of the managed merger algorithm. Now that I think about it a bit, if we don't prefer in memory data we'll be losing all the potential for tiered pgo improving over static pgo, so... yeah, I'll swap it around.
src/coreclr/vm/pgo.cpp
Outdated
maxCount = h.m_histogram[m].m_count; | ||
} | ||
} | ||
|
||
if (maxCount > 0) | ||
UINT32 maxKnownLikelihood = (100 * maxKnownCount) / h.m_totalCount; | ||
if ((maxKnownCount > 0) && ((maxKnownCount == maxCount) || (maxKnownLikelihood > 33))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should filter results here. The jit can decide if likelihood warrants testing things (currently we have different thresholds for virtual and interface calls, for instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will do.
@AndyAyersMS This won't quite address #13672 as we haven't yet actually enabled this logic in our build systems, but yes, this is much of the technical work to get us there. Currently the system puts a bit too much of the pgo data into the file, causing too much size bloat to be something we can enable by default, and the dotnet-optimization repo isn't yet generating the mibc data, but we're getting there. |
@AndyAyersMS I just wanted to make sure you were aware of the change to enable the BBOPT mode unconditionally when optimizing. It seemed like a good enough idea to me, but it is a tad scary. |
Can you say more here? #46882 (which is a few days out) plus not instrumenting single-block methods should reduce the number of count records by roughly a factor of two. The latter may be problematic if we plan to use PGO for partial prejitting or global importance, but otherwise single-block method counts are uninteresting.
I think it's fine. The gating factor is the availability of PGO data, so for now this shouldn't have any impact (unless you are also making legacy IBC data available at runtime, but I didn't see any code that would do that). We will probably want to introduce a jit or runtime mode to suppress BBOPT once PGO data becomes more commonplace. |
@AndyAyersMS well, the current implementation puts 100% of the data into the file. While the data is actually pretty well compressed, between the data and the bookkeeping to find the appropriate data there is measurable increase in size of the final file that is beyond the current desired level. (I don't have good numbers except for my really small test app where the increase in size is comparable to the size of the generated code. If that holds for larger binaries, we're talking a 30% increase in file size which is utterly unacceptable.). My goal is to get numbers out of our optimization tests, and see how it affects corelib, etc, and see what is to be done. I have a number of ideas on how to reduce the size impact once it becomes usefully measurable, but there are likely quite a few more tweaks to come. Some ideas include:
There isn't an official goal yet for exactly the size we're looking for, but my personal goal is that we shouldn't increase the size of our generated assemblies by more than 1% with this work. |
@dotnet/crossgen-contrib |
FYI @dotnet/jit-contrib for the "always set BBOPT when optimizing" bit of this change |
If you mean "perf numbers" be aware there are still a lot of rough edges in the jit.
That seems ambitious. We should split out the impact of the extra profile payload from the impact of the codegen changes it will engender. Ideally the jit would be smart about producing compact code for things that are unlikely hot paths, and so "buy room" for the profile payload and/or more ambitious codegen on hot paths, but I don't think we're there yet, save for a few special cases. |
@AnyAyersMS For numbers in this context, I'm more concerned about size numbers than wallclock style perf. And my opinion on size is that I know that we don't actually execute all that much of most of our binaries in the vast majority of applications, so most methods won't have any data at all. That will apply a substantial factor to reduce the size cost of carrying instrumentation data. As I said, I don't want to put down any strong statements about what we'll do until we have some form of numbers, but in the static pgo story (at least for our product shipped images, we have a much smaller size budget to work with than I'd like.) |
@@ -236,6 +238,232 @@ private int GetSize() | |||
} | |||
} | |||
|
|||
public class PgoInfoKey : IEquatable<PgoInfoKey> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these classes to their own file?
_pgoInfo = _readyToRunReader.GetPgoInfoByKey(PgoInfoKey.FromReadyToRunMethod(this)); | ||
if (_pgoInfo == null) | ||
{ | ||
_pgoInfo = new object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we init to new object()
if we have no pgoInfo for a key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... my thought was to avoid a bunch of manual null checks, but I'll have to take a look and see if I was thinking straight at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks David!
Future PGO work will include
With this checkin, the feature is functional from a crossgen2.exe point of view, but its not polished, and it cannot be used from the Crossgen2 sdk integration (as the sdk does not have the ability to pass an extra pair of arguments to the compiler. That said, the following script can be used to demonstrate the feature.
The example expects that the following environment variables are set
And that the application is called pgotest.csproj and is in the current directory. Assuming all of that is setup, then the following script will compile an application, run it under instrumentation, feed the results back into crossgen2, and then run the application again, this time with instrumentation enabled.