-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phase 1 of refactoring pgo data pipeline #46638
Conversation
Improve stable hash to match the other R2R hashes Use method IL body hash to replace token in pgo data Reworked data storage for pgo to allow for more efficient lookup Checkin before rebase Jit no longer uses block count apis
- Handle large integers correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really good.
Left a few notes.
src/coreclr/inc/corjit.h
Outdated
// 4. Each data entry shall be laid out without extra padding. | ||
// | ||
// The intention here is that it becomes possible to describe a C data structure with the alignment for ease of use with | ||
// inst5rumentation helper functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// inst5rumentation helper functions | |
// instrumentation helper functions |
return HRESULT.E_NOTIMPL; | ||
} | ||
|
||
// Only allocation of PGO data for the current method is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will want to support this for inlinees, eventually (#44372).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That limitation will remain only be for IBC style instrumentation in R2R dlls. The general purpose instrumentation support will only be present in the runtime. If we drop support for IBC style R2R instrumentation as we will likely do, this code will be removed entirely from crossgen2.
src/coreclr/vm/pgo.h
Outdated
#include "shash.h" | ||
|
||
|
||
enum class PgoInstrumentationKind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some way this can be shared with the version in corjit.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thought was that this one would only have the minimal number of enum values describing marshalling so that it was clear that the JIT defined the meanings of stuff, but honestly, that's not a major benefit. I'll merge the two.
src/coreclr/jit/flowgraph.cpp
Outdated
} | ||
|
||
return Compiler::WALK_CONTINUE; | ||
// Restore the stub address \on call, whether instrumenting or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Restore the stub address \on call, whether instrumenting or not. | |
// Restore the stub address on the call, whether instrumenting or not. |
@@ -270,7 +307,7 @@ void PgoManager::ReadPgoData() | |||
return; | |||
} | |||
|
|||
char buffer[256]; | |||
char buffer[16384]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any symbolic constant we can use here?
HeaderList *currentHeaderList = m_pgoHeaders; | ||
if (currentHeaderList != NULL) | ||
{ | ||
if (!ComparePgoSchemaEquals(currentHeaderList->header.GetData(), currentHeaderList->header.countsOffset, pSchema, countSchemaItems)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to tie into the IL versioning scheme? I don't think the jit knows which IL version is active for a jit request, but presumably the jit interface does (or could know).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect so. The pgo instrumentation data structures will be the same if the schema's match up, and the JIT already must tolerate data which isn't completely accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, the reading side may be more interesting, but again, that gets into the question of how bad is it to present incorrectly structured data, if that's really bad we could easily add the IL code hash into the schema to allow for the JIT to check it for correctness or something.
*pBlockCounts = &s_PgoData[index + 2]; | ||
*pCount = header->recordCount - 2; | ||
*pNumRuns = 1; | ||
if (!ComparePgoSchemaEquals(existingData->header.GetData(), existingData->header.countsOffset, pSchema, countSchemaItems)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here, do we need to make sure we're looking at the same IL version the jit is compiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect so. The pgo instrumentation data structures will be the same if the schema's match up, and the JIT already must tolerate data which isn't completely accurate.
index += header->recordCount; | ||
methodsChecked++; | ||
AllocMemTracker loaderHeapAllocation; | ||
pHeaderList = (HeaderList*)loaderHeapAllocation.Track(pMD->GetLoaderAllocator()->GetHighFrequencyHeap()->AllocMem(allocationSize)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the schema could go in the low frequency heap but maybe that's more trouble than it's worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the jit interface api allows for doing things like that (which is why the Offset field is a full size_t), but for now, it would cause problems for the superpmi implementation I've come up with, and its not super clear how valuable the high vs low frequency heap actually is.
- This will be better prepared for the port to managed - No need to include these complex templates and such in all vm files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
The code in zapinfo still looks wrong to me, it won't build a proper schema.
Would be good to run the jit-experimental CI leg, and maybe also a jitstress leg.
@AndyAyersMS I've fixed the ZapInfo issues, and verified that traditional IBC continues to work. I've also scheduled the jit-experimental and jitstress runs as requested. |
In case you haven't yet looked at the experimental failures -- they seem to be related. |
@davidwrighton You didn't update the JIT-EE GUID with this. Can you please follow-up with a GUID change ASAP? |
Phase 1 of replacing existing infrastructure around handling of pgo data with more flexible schema based approach.
The schema based approach allows the JIT to define the form of data needed for instrumentation.
Changes part of this phase
Future Changes for static Pgo