-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for partially jitted methods #60791
Conversation
Introduce a new form of patchpoint for rarely-executed blocks in a method (typically, blocks with throws). Instead of jitting the code in those blocks, add code to call a new patchpoint helper (similar to the existing one) which finds or creates the jitted code for the method beyond this point, and then transitions control to that code. Suppressing codegen for exception throws in this way gives similar benefits to the throw helper transformation, without needing to modify source. This currently works only in Tier0, but in principle could be extended to handle optimized code.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsIntroduce a new form of patchpoint for rarely-executed blocks in a method Suppressing codegen for exception throws in this way gives similar benefits
|
cc @dotnet/jit-contrib I've had this bit of tech sitting around for quite a while but it was held up by the OSR mid-try entry issue. With that resolved I figure we might as well get this into the mainline (off by default). Passes the pri-0 tests locally, not that that means a whole lot. Will be adding a stress mode subsequently as we can trigger these at more or less any stack empty point. Sample diff using System;
class E : Exception
{
public E(double v)
{
m_v = v;
}
double m_v;
}
class X
{
public static void Main(string[] args)
{
if (args.Length == 1)
{
throw new E(11.3);
}
Console.WriteLine("hello, world\n");
}
} ;;; before
; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) ref -> [rbp+10H] do-not-enreg[] class-hnd
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] do-not-enreg[] "OutgoingArgSpace"
; V02 tmp1 [V02 ] ( 1, 1 ) ref -> [rbp-08H] do-not-enreg[] must-init class-hnd exact "NewObj constructor temp"
;
; Lcl frame size = 48
G_M3786_IG01: ;; offset=0000H
55 push rbp
4883EC30 sub rsp, 48
C5F877 vzeroupper
488D6C2430 lea rbp, [rsp+30H]
33C0 xor eax, eax
488945F8 mov qword ptr [rbp-08H], rax
48894D10 mov gword ptr [rbp+10H], rcx
;; bbWeight=1 PerfScore 5.00
G_M3786_IG02: ;; offset=0017H
488B4D10 mov rcx, gword ptr [rbp+10H]
83790801 cmp dword ptr [rcx+8], 1
752B jne SHORT G_M3786_IG04
;; bbWeight=1 PerfScore 5.00
G_M3786_IG03: ;; offset=0021H
488D0D406D3C00 lea rcx, [(reloc 0x7ff9dedf3008)]
E86359A05F call CORINFO_HELP_NEWSFAST
488945F8 mov gword ptr [rbp-08H], rax
C5FB100D2F000000 vmovsd xmm1, qword ptr [reloc @RWD00]
488B4DF8 mov rcx, gword ptr [rbp-08H]
E866FFFFFF call E:.ctor(double):this
488B4DF8 mov rcx, gword ptr [rbp-08H]
E815A1695F call CORINFO_HELP_THROW
;; bbWeight=0 PerfScore 0.00
G_M3786_IG04: ;; offset=004BH
48B9E0310098AD010000 mov rcx, 0x1AD980031E0 ; "hello, world "
488B09 mov rcx, gword ptr [rcx]
E8D3FCFFFF call System.Console:WriteLine(System.String)
90 nop
;; bbWeight=1 PerfScore 3.50
G_M3786_IG05: ;; offset=005EH
4883C430 add rsp, 48
5D pop rbp
C3 ret
;; bbWeight=1 PerfScore 1.75
RWD00 dq 402699999999999Ah ; 11.3
; Total bytes of code 100, prolog size 19, PerfScore 25.35, instruction count 25, allocated bytes for code 101 (MethodHash=69d3f135) for method X:Main(System.String[])
;;; after (Tier0)
; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 1, 1 ) ref -> [rbp+10H] do-not-enreg[] class-hnd
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] do-not-enreg[] "OutgoingArgSpace"
; V02 tmp1 [V02 ] ( 1, 1 ) ref -> [rbp-08H] do-not-enreg[] must-init class-hnd exact "NewObj constructor temp"
;
; Lcl frame size = 48
G_M3786_IG01: ;; offset=0000H
55 push rbp
4883EC30 sub rsp, 48
488D6C2430 lea rbp, [rsp+30H]
33C0 xor eax, eax
488945F8 mov qword ptr [rbp-08H], rax
48894D10 mov gword ptr [rbp+10H], rcx
;; bbWeight=1 PerfScore 4.00
G_M3786_IG02: ;; offset=0014H
488B4D10 mov rcx, gword ptr [rbp+10H]
83790801 cmp dword ptr [rcx+8], 1
750A jne SHORT G_M3786_IG04
;; bbWeight=1 PerfScore 5.00
G_M3786_IG03: ;; offset=001EH
B906000000 mov ecx, 6
E888096B5F call CORINFO_HELP_PARTIAL_COMPILATION_PATCHPOINT
;; bbWeight=0 PerfScore 0.00
G_M3786_IG04: ;; offset=0028H
48B9E031009812020000 mov rcx, 0x212980031E0 ; "hello, world "
488B09 mov rcx, gword ptr [rcx]
E8F6FCFFFF call System.Console:WriteLine(System.String)
90 nop
;; bbWeight=1 PerfScore 3.50
G_M3786_IG05: ;; offset=003BH
4883C430 add rsp, 48
5D pop rbp
C3 ret
;; bbWeight=1 PerfScore 1.75
; Total bytes of code 65, prolog size 16, PerfScore 20.75, instruction count 18, allocated bytes for code 65 (MethodHash=69d3f135) for method X:Main(System.String[])
;; after (OSR continuation)
; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; OSR variant for entry point 0x6
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 arg0 [V00 ] ( 0, 0 ) ref -> zero-ref class-hnd single-def
; V01 OutArgs [V01 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
; V02 tmp1 [V02,T00] ( 4, 0 ) ref -> rsi class-hnd exact single-def "NewObj constructor temp"
;
; Lcl frame size = 32
G_M3786_IG01: ;; offset=0000H
56 push rsi
4883EC20 sub rsp, 32
;; bbWeight=1 PerfScore 1.25
G_M3786_IG02: ;; offset=0005H
488D0DFC6C3C00 lea rcx, [(reloc 0x7ff9dede3008)]
E8BF60A15F call CORINFO_HELP_NEWSFAST
488BF0 mov rsi, rax
488BCE mov rcx, rsi
E83CCFFEFF call System.Exception:.ctor():this
48B99A99999999992640 mov rcx, 0x402699999999999A
48894E78 mov qword ptr [rsi+120], rcx
488BCE mov rcx, rsi
E8CEA06A5F call CORINFO_HELP_THROW
CC int3
;; bbWeight=0 PerfScore 0.00
; Total bytes of code 51, prolog size 5, PerfScore 6.35, instruction count 12, allocated bytes for code 51 (MethodHash=69d3f135) for method X:Main(System.String[]) |
void TransformPartialCompilation(BasicBlock* block) | ||
{ | ||
// Capture the IL offset | ||
IL_OFFSET ilOffset = block->bbCodeOffs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose for the most part we don't end up inlining functions with EH even in optimized codegen, but do you have any thoughts on whether we want to eventually support it here when this is supported for optimized code and how to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I haven't looked at it closely, inlining methods with EH doesn't seem to pose any difficult problems, it's just a lot of bookkeeping to get the EH tables properly updated.
Supporting patchpoints in optimized code poses a number of difficult problems. There is a section in the OSR doc with some thoughts on how we could enable this.
Not sure if the above addresses what you were asking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly I was thinking about the fact that we are using IL offsets, so for optimized code if the throw comes from an inlinee then I suppose more information will be needed to figure out the path of inlinees for the OSR variant to get to the inlinee's throw. I suppose this is a a more general problem to solve than for just these particular throw helper patchpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the "patchpoint ID" will need to be something more like a slice of the inline context tree -- say the IL offset for the root plus method and IL offset for the inlinees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome.
/azp run runtime-jit-experimental |
Azure Pipelines successfully started running 1 pipeline(s). |
Verifying the newly added jit-experimental pipeline stage. Note there are some known failures in the OSR portions of the pipeline (because OSR gets enabled for the crossgen2 host, which uses an older .NET 6 version without a key fix). So some failures here are expected. |
There was one failure with partial compilation and profiler hooks -- investigating that now. I can't repro it locally and don't see any OSR methods generated during the test. So going to ignore. |
Wonder if this is going to improve start up time in FullPGO mode, will keep an eye on TE benchmarks |
It is not enabled by default, so you'd have to do a custom run. We might also entertain the idea of deferring jitting for cold parts of methods (based on static pgo) but this might lead to issues where if those parts are ever executed we lose PGO data (similar to issues we have with OSR & PGO). |
Introduce a new form of patchpoint for rarely-executed blocks in a method
(typically, blocks with throws). Instead of jitting the code in those blocks,
add code to call a new patchpoint helper (similar to the existing one) which
finds or creates the jitted code for the method beyond this point, and then
transitions control to that code.
Suppressing codegen for exception throws in this way gives similar benefits
to the throw helper transformation, without needing to modify source. This
currently works only in Tier0, but in principle could be extended to handle
optimized code.