-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update how OSR and PGO interact #61453
Conversation
When both OSR and PGO are enabled: * Enable instrumenting OSR methods, so that the combined profile data from Tier0 plus any OSR variants provide a full picture for subsequent Tier1 optimization. * Use block profiles for both Tier0 methods that are likely to have patchpoints and OSR methods. The updates on the runtime side are to pass BBINSTR to OSR methods, and to handle the (typical) case where the OSR method instrumentation schema is a subset of the Tier0 method schema. We are still allowing OSR methods to read the profile data. So they are both profile instrumented and profile optimized. Not clear if this is going to work well as the Tier0 data will be incomplete and optimization quality may be poor. Something to revisit down the road.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWhen both OSR and PGO are enabled:
The updates on the runtime side are to pass BBINSTR to OSR methods, and to We are still allowing OSR methods to read the profile data. So they are both
|
cc @davidwrighton @dotnet/jit-contrib Smoke tested on a few simple examples. No SPMI diffs (which includes a fair number of Tier0 instrumentation instances). I am enabling some new experimental legs which I'll run here. Example of an OSR method with instrumentation probes: public static int Looper(B b)
{
int[] a = new int[1000];
a[555] = 1;
int result = 0;
for (int i = 0; i < 1000; i++)
{
for (int j = i; j < 1000; j++)
{
result += a[j];
}
}
// b.F() will only GDV if OSR method has the right class profile
return result - b.F();
} ; Assembly listing for method Example:Looper(B):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; OSR variant for entry point 0x1d
; optimized code
; instrumented for collecting profile data
; rsp based frame
; fully interruptible
; with PGO: edge weights are valid, and fgCalledCount is 11
; Final local variable assignments
;
; V00 arg0 [V00,T03] ( 4, 2 ) ref -> rsi class-hnd single-def
; V01 loc0 [V01,T02] ( 2,1816 ) ref -> rdx class-hnd
; V02 loc1 [V02,T01] ( 3,1816 ) int -> rdi
; V03 loc2 [V03,T04] ( 4, 4 ) int -> rax
; V04 loc3 [V04,T00] ( 6,4541 ) int -> rcx
; V05 OutArgs [V05 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V06 tmp1 [V06 ] ( 0, 0 ) ref -> zero-ref single-def "class profile tmp"
; V07 tmp2 [V07,T05] ( 3, 0 ) ref -> rcx single-def "argument with side effect"
;
; Lcl frame size = 40
G_M20617_IG01: ;; offset=0000H
57 push rdi
56 push rsi
4883EC28 sub rsp, 40
488BB424A0000000 mov rsi, gword ptr [rsp+A0H]
488B942488000000 mov rdx, gword ptr [rsp+88H]
8BBC2484000000 mov edi, dword ptr [rsp+84H]
8B842480000000 mov eax, dword ptr [rsp+80H]
8B4C247C mov ecx, dword ptr [rsp+7CH]
;; bbWeight=1 PerfScore 12.25
G_M20617_IG02: ;; offset=0028H
EB0E jmp SHORT G_M20617_IG04
;; bbWeight=1 PerfScore 2.00
G_M20617_IG03: ;; offset=002AH
FF05DC483C00 inc dword ptr [(reloc 0x7ffa0c561dcc)]
8BC8 mov ecx, eax
FF05DC483C00 inc dword ptr [(reloc 0x7ffa0c561dd4)]
;; bbWeight=1 PerfScore 6.25
G_M20617_IG04: ;; offset=0038H
FF05D2483C00 inc dword ptr [(reloc 0x7ffa0c561dd0)]
3B4A08 cmp ecx, dword ptr [rdx+8]
7364 jae SHORT G_M20617_IG08
448BC1 mov r8d, ecx
42037C8210 add edi, dword ptr [rdx+4*r8+16]
FFC1 inc ecx
FF05C1483C00 inc dword ptr [(reloc 0x7ffa0c561dd4)]
81F9E8030000 cmp ecx, 0x3E8
7CDD jl SHORT G_M20617_IG04
;; bbWeight=908 PerfScore 13393.00
G_M20617_IG05: ;; offset=005BH
FF05B7483C00 inc dword ptr [(reloc 0x7ffa0c561dd8)]
FFC0 inc eax
FF05B3483C00 inc dword ptr [(reloc 0x7ffa0c561ddc)]
3DE8030000 cmp eax, 0x3E8
7CBA jl SHORT G_M20617_IG03
;; bbWeight=1 PerfScore 7.50
G_M20617_IG06: ;; offset=0070H
FF05AA483C00 inc dword ptr [(reloc 0x7ffa0c561de0)]
488BCE mov rcx, rsi
48BAE81D560CFA7F0000 mov rdx, 0x7FFA0C561DE8
E898A1685F call CORINFO_HELP_CLASSPROFILE32
488BCE mov rcx, rsi
488B01 mov rax, qword ptr [rcx]
488B4048 mov rax, qword ptr [rax+72]
FF5020 call [rax+32]B:F():int:this
8BD7 mov edx, edi
2BD0 sub edx, eax
8BC2 mov eax, edx
;; bbWeight=0 PerfScore 0.00
G_M20617_IG07: ;; offset=009BH
4883C428 add rsp, 40
5E pop rsi
5F pop rdi
4883C458 add rsp, 88
5D pop rbp
C3 ret
;; bbWeight=0 PerfScore 0.00
G_M20617_IG08: ;; offset=00A7H
E8F438685F call CORINFO_HELP_RNGCHKFAIL
CC int3
;; bbWeight=0 PerfScore 0.00
; Total bytes of code 173, prolog size 40, PerfScore 13438.30, instruction count 45, allocated bytes for code 173 (MethodHash=9b03af76) for method Example:Looper(B):int |
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
Addresses issues from #47942. |
// Return true if schemaTable entries are a subset of the schema described by pByte, with matching entries in the same order. | ||
// Also updates offset of the matching entries in schemaTable to those of the pByte schema. | ||
// | ||
inline bool ComparePgoSchemaCompatible(const uint8_t *pByte, size_t cbDataMax, ICorJitInfo::PgoInstrumentationSchema* schemaTable, size_t cSchemas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little surprised that this mutates the schema. Maybe rename it to something like SetOffsetsForSchemaSubsequence
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about CheckIfPgoSchemaIsCompatibleAndSetOffsets
?
|
||
auto handler = [schemaTable, &nSchema, &nMatched, &nUnmatched](const ICorJitInfo::PgoInstrumentationSchema& schema) | ||
{ | ||
const size_t iSchemaAdj = nSchema - nUnmatched; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as nMatched
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thinking a little bit more about it, that means this might index off the end of schemaTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Handful of OSR+PGO tests failing with
Will investigate. |
Actually all the novel failures are with OSR+PGO+PartialCompilation. Turns out to be related to the mid-try entry OSR step block transformation introduced in #59784 -- when we have that one cannot simply walk from |
|
||
auto handler = [schemaTable, &nSchema, &nMatched, &nUnmatched](const ICorJitInfo::PgoInstrumentationSchema& schema) | ||
{ | ||
if ((schema.InstrumentationKind != schemaTable[nMatched].InstrumentationKind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we still need to guard on nMatched < cSchemas
before this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Reworked this some more.
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
we think we might instrument and the number of things we end up instrumenting can differ. Also improve the DumpJittedMethod output for OSR, and allow selective dumping of a particular OSR variant by specifying its IL offset.
Still ironing out PGO+PartialCompilation issues. Hopefully that's the last round of updates needed. |
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
jit-experimental failures are now down to just the "known" subset from osr_stress (happens because crossgen2 runs using the .NET installed for the build, which is 6.0RC2, and missing a key OSR fix). So the newly added CI legs for PGO+OSR, etc are passing pri0 tests. |
Seeing random-looking failures in runtime tests:
going to rerun... |
@jakobbotsch I made some more changes, so can you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Probably improved dotnet/perf-autofiling-issues#2376 on windows-x64 |
Possibly from changing the vm side schema to support a subset rather than an exact match? |
We started doing devir in unoptimized builds. I assume that was an unintended consequence of dotnet#61453.
We started doing devir in unoptimized builds. I assume that was an unintended consequence of #61453.
We started doing devir in unoptimized builds. I assume that was an unintended consequence of dotnet#61453.
When both OSR and PGO are enabled:
Tier0 plus any OSR variants provide a full picture for subsequent Tier1
optimization.
and OSR methods.
The updates on the runtime side are to pass BBINSTR to OSR methods, and to
handle the (typical) case where the OSR method instrumentation schema is a subset
of the Tier0 method schema.
We are still allowing OSR methods to read the profile data. So they are both
profile instrumented and profile optimized. Not clear if this is going to work
well as the Tier0 data will be incomplete and optimization quality may be poor.
Something to revisit down the road.