Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for partially jitted methods #60791

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

AndyAyersMS
Copy link
Member

Introduce a new form of patchpoint for rarely-executed blocks in a method
(typically, blocks with throws). Instead of jitting the code in those blocks,
add code to call a new patchpoint helper (similar to the existing one) which
finds or creates the jitted code for the method beyond this point, and then
transitions control to that code.

Suppressing codegen for exception throws in this way gives similar benefits
to the throw helper transformation, without needing to modify source. This
currently works only in Tier0, but in principle could be extended to handle
optimized code.

Introduce a new form of patchpoint for rarely-executed blocks in a method
(typically, blocks with throws). Instead of jitting the code in those blocks,
add code to call a new patchpoint helper (similar to the existing one) which
finds or creates the jitted code for the method beyond this point, and then
transitions control to that code.

Suppressing codegen for exception throws in this way gives similar benefits
to the throw helper transformation, without needing to modify source. This
currently works only in Tier0, but in principle could be extended to handle
optimized code.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 23, 2021
@ghost
Copy link

ghost commented Oct 23, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Introduce a new form of patchpoint for rarely-executed blocks in a method
(typically, blocks with throws). Instead of jitting the code in those blocks,
add code to call a new patchpoint helper (similar to the existing one) which
finds or creates the jitted code for the method beyond this point, and then
transitions control to that code.

Suppressing codegen for exception throws in this way gives similar benefits
to the throw helper transformation, without needing to modify source. This
currently works only in Tier0, but in principle could be extended to handle
optimized code.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

I've had this bit of tech sitting around for quite a while but it was held up by the OSR mid-try entry issue. With that resolved I figure we might as well get this into the mainline (off by default).

Passes the pri-0 tests locally, not that that means a whole lot. Will be adding a stress mode subsequently as we can trigger these at more or less any stack empty point.

Sample diff

using System;

class E : Exception
{
    public E(double v)
    {
        m_v = v;
    }

    double m_v;
}

class X
{
    public static void Main(string[] args)
    {
        if (args.Length == 1)
        {
            throw new E(11.3);
        }
        Console.WriteLine("hello, world\n");
    }
}
;;; before
; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )     ref  ->  [rbp+10H]   do-not-enreg[] class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   do-not-enreg[] "OutgoingArgSpace"
;  V02 tmp1         [V02    ] (  1,  1   )     ref  ->  [rbp-08H]   do-not-enreg[] must-init class-hnd exact "NewObj constructor temp"
;
; Lcl frame size = 48

G_M3786_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC30             sub      rsp, 48
       C5F877               vzeroupper
       488D6C2430           lea      rbp, [rsp+30H]
       33C0                 xor      eax, eax
       488945F8             mov      qword ptr [rbp-08H], rax
       48894D10             mov      gword ptr [rbp+10H], rcx
                                                ;; bbWeight=1    PerfScore 5.00
G_M3786_IG02:              ;; offset=0017H
       488B4D10             mov      rcx, gword ptr [rbp+10H]
       83790801             cmp      dword ptr [rcx+8], 1
       752B                 jne      SHORT G_M3786_IG04
                                                ;; bbWeight=1    PerfScore 5.00
G_M3786_IG03:              ;; offset=0021H
       488D0D406D3C00       lea      rcx, [(reloc 0x7ff9dedf3008)]
       E86359A05F           call     CORINFO_HELP_NEWSFAST
       488945F8             mov      gword ptr [rbp-08H], rax
       C5FB100D2F000000     vmovsd   xmm1, qword ptr [reloc @RWD00]
       488B4DF8             mov      rcx, gword ptr [rbp-08H]
       E866FFFFFF           call     E:.ctor(double):this
       488B4DF8             mov      rcx, gword ptr [rbp-08H]
       E815A1695F           call     CORINFO_HELP_THROW
                                                ;; bbWeight=0    PerfScore 0.00
G_M3786_IG04:              ;; offset=004BH
       48B9E0310098AD010000 mov      rcx, 0x1AD980031E0      ; "hello, world "
       488B09               mov      rcx, gword ptr [rcx]
       E8D3FCFFFF           call     System.Console:WriteLine(System.String)
       90                   nop
                                                ;; bbWeight=1    PerfScore 3.50
G_M3786_IG05:              ;; offset=005EH
       4883C430             add      rsp, 48
       5D                   pop      rbp
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.75
RWD00   dq      402699999999999Ah       ;         11.3


; Total bytes of code 100, prolog size 19, PerfScore 25.35, instruction count 25, allocated bytes for code 101 (MethodHash=69d3f135) for method X:Main(System.String[])

;;; after (Tier0)

; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  1,  1   )     ref  ->  [rbp+10H]   do-not-enreg[] class-hnd
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   do-not-enreg[] "OutgoingArgSpace"
;  V02 tmp1         [V02    ] (  1,  1   )     ref  ->  [rbp-08H]   do-not-enreg[] must-init class-hnd exact "NewObj constructor temp"
;
; Lcl frame size = 48

G_M3786_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC30             sub      rsp, 48
       488D6C2430           lea      rbp, [rsp+30H]
       33C0                 xor      eax, eax
       488945F8             mov      qword ptr [rbp-08H], rax
       48894D10             mov      gword ptr [rbp+10H], rcx
                                                ;; bbWeight=1    PerfScore 4.00
G_M3786_IG02:              ;; offset=0014H
       488B4D10             mov      rcx, gword ptr [rbp+10H]
       83790801             cmp      dword ptr [rcx+8], 1
       750A                 jne      SHORT G_M3786_IG04
                                                ;; bbWeight=1    PerfScore 5.00
G_M3786_IG03:              ;; offset=001EH
       B906000000           mov      ecx, 6
       E888096B5F           call     CORINFO_HELP_PARTIAL_COMPILATION_PATCHPOINT
                                                ;; bbWeight=0    PerfScore 0.00
G_M3786_IG04:              ;; offset=0028H
       48B9E031009812020000 mov      rcx, 0x212980031E0      ; "hello, world "
       488B09               mov      rcx, gword ptr [rcx]
       E8F6FCFFFF           call     System.Console:WriteLine(System.String)
       90                   nop
                                                ;; bbWeight=1    PerfScore 3.50
G_M3786_IG05:              ;; offset=003BH
       4883C430             add      rsp, 48
       5D                   pop      rbp
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.75

; Total bytes of code 65, prolog size 16, PerfScore 20.75, instruction count 18, allocated bytes for code 65 (MethodHash=69d3f135) for method X:Main(System.String[])

;; after (OSR continuation)

; Assembly listing for method X:Main(System.String[])
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; OSR variant for entry point 0x6
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 arg0         [V00    ] (  0,  0   )     ref  ->  zero-ref    class-hnd single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V02 tmp1         [V02,T00] (  4,  0   )     ref  ->  rsi         class-hnd exact single-def "NewObj constructor temp"
;
; Lcl frame size = 32

G_M3786_IG01:              ;; offset=0000H
       56                   push     rsi
       4883EC20             sub      rsp, 32
                                                ;; bbWeight=1    PerfScore 1.25
G_M3786_IG02:              ;; offset=0005H
       488D0DFC6C3C00       lea      rcx, [(reloc 0x7ff9dede3008)]
       E8BF60A15F           call     CORINFO_HELP_NEWSFAST
       488BF0               mov      rsi, rax
       488BCE               mov      rcx, rsi
       E83CCFFEFF           call     System.Exception:.ctor():this
       48B99A99999999992640 mov      rcx, 0x402699999999999A
       48894E78             mov      qword ptr [rsi+120], rcx
       488BCE               mov      rcx, rsi
       E8CEA06A5F           call     CORINFO_HELP_THROW
       CC                   int3
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 51, prolog size 5, PerfScore 6.35, instruction count 12, allocated bytes for code 51 (MethodHash=69d3f135) for method X:Main(System.String[])

void TransformPartialCompilation(BasicBlock* block)
{
// Capture the IL offset
IL_OFFSET ilOffset = block->bbCodeOffs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose for the most part we don't end up inlining functions with EH even in optimized codegen, but do you have any thoughts on whether we want to eventually support it here when this is supported for optimized code and how to do so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I haven't looked at it closely, inlining methods with EH doesn't seem to pose any difficult problems, it's just a lot of bookkeeping to get the EH tables properly updated.

Supporting patchpoints in optimized code poses a number of difficult problems. There is a section in the OSR doc with some thoughts on how we could enable this.

Not sure if the above addresses what you were asking...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly I was thinking about the fact that we are using IL offsets, so for optimized code if the throw comes from an inlinee then I suppose more information will be needed to figure out the path of inlinees for the OSR variant to get to the inlinee's throw. I suppose this is a a more general problem to solve than for just these particular throw helper patchpoints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the "patchpoint ID" will need to be something more like a slice of the inline context tree -- say the IL offset for the root plus method and IL offset for the inlinees.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome.

@AndyAyersMS
Copy link
Member Author

/azp run runtime-jit-experimental

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

Verifying the newly added jit-experimental pipeline stage.

Note there are some known failures in the OSR portions of the pipeline (because OSR gets enabled for the crossgen2 host, which uses an older .NET 6 version without a key fix). So some failures here are expected.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Oct 25, 2021

There was one failure with partial compilation and profiler hooks -- investigating that now.

I can't repro it locally and don't see any OSR methods generated during the test. So going to ignore.

@AndyAyersMS AndyAyersMS merged commit 1d98464 into dotnet:main Oct 25, 2021
@AndyAyersMS AndyAyersMS deleted the PartialCompilation2 branch October 25, 2021 18:51
@EgorBo
Copy link
Member

EgorBo commented Oct 25, 2021

Wonder if this is going to improve start up time in FullPGO mode, will keep an eye on TE benchmarks

@AndyAyersMS
Copy link
Member Author

It is not enabled by default, so you'd have to do a custom run.

We might also entertain the idea of deferring jitting for cold parts of methods (based on static pgo) but this might lead to issues where if those parts are ever executed we lose PGO data (similar to issues we have with OSR & PGO).

@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants