Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Expand runtime lookups in a late phase #81635

Merged
merged 71 commits into from
Mar 14, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 4, 2023

Closes #35551 and is a preparation for #81432

The main advantage of the late expansion is that we will be able to do CSE/Hoisting for such helpers, e.g.:

static void Test<T>()
{
    for (int i = 0; i < 100; i++)
        Callee<T>();
}

Current codegen:

; Assembly listing for method Prog:Test[System.__Canon]()
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC30             sub      rsp, 48
       48894C2428           mov      qword ptr [rsp+28H], rcx
       488BF1               mov      rsi, rcx
       33FF                 xor      edi, edi
       488B5E10             mov      rbx, qword ptr [rsi+10H]

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop start
G_M000_IG03:                
       488B4B10             mov      rcx, qword ptr [rbx+10H]
       4885C9               test     rcx, rcx
       7402                 je       SHORT G_M000_IG05
       EB15                 jmp      SHORT G_M000_IG06
G_M000_IG05:               
       488BCE               mov      rcx, rsi
       48BAB80C584EFC7F0000 mov      rdx, 0x7FFC4E580CB8
       E87E36A95F           call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
       488BC8               mov      rcx, rax
G_M000_IG06:                
       FF159D932800         call     [Prog:Callee[System.__Canon]():bool]
       FFC7                 inc      edi
       83FF64               cmp      edi, 100
       7CD3                 jl       SHORT G_M000_IG03
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop end

       4883C430             add      rsp, 48
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      
; Total bytes of code 74

We were not able to hoist the runtime lookup stuff out of loop while this PR fixes that.
New codegen:

; Assembly listing for method Prog:Test[System.__Canon]()
G_M2205_IG01:              
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC30             sub      rsp, 48
       48894C2428           mov      qword ptr [rsp+28H], rcx
G_M2205_IG02:              
       33F6                 xor      esi, esi
       488B5138             mov      rdx, qword ptr [rcx+38H]
       488B7A10             mov      rdi, qword ptr [rdx+10H]
       4885FF               test     rdi, rdi
       7405                 je       SHORT G_M2205_IG04
G_M2205_IG03:              
       488BDF               mov      rbx, rdi
       EB12                 jmp      SHORT G_M2205_IG05
G_M2205_IG04:              
       48BA20D3D3B5FC7F0000 mov      rdx, 0x7FFCB5D3D320      ; global ptr
       E8A1D10F5F           call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
       488BD8               mov      rbx, rax
       
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop start
G_M2205_IG05:              
       488BCB               mov      rcx, rbx
       FF1595216000         call     [Prog:Callee[System.__Canon]()]
       FFC6                 inc      esi
       83FE64               cmp      esi, 100
       7CF0                 jl       SHORT G_M2205_IG05
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop end

G_M2205_IG06:              
       4883C430             add      rsp, 48
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      
; Total bytes of code 74

The call is correctly hoisted

This is also required to do proper fix for @jkotas's suggestion in #81432 (comment)

Quite nice diffs

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 4, 2023
@ghost ghost assigned EgorBo Feb 4, 2023
@ghost
Copy link

ghost commented Feb 4, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #35551 and is a preparation for #81432

The main advantage of the late expansion is that we will be able to do CSE/Hoisting for such helpers, e.g.:

static void Test<T>()
{
    for (int i = 0; i < 100; i++)
        Callee<T>();
}

Current codegen:

; Assembly listing for method Prog:Test[System.__Canon]()
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC30             sub      rsp, 48
       48894C2428           mov      qword ptr [rsp+28H], rcx
       488BF1               mov      rsi, rcx
       33FF                 xor      edi, edi
       488B5E10             mov      rbx, qword ptr [rsi+10H]

       ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop start
G_M000_IG03:                
       488B4B10             mov      rcx, qword ptr [rbx+10H]
       4885C9               test     rcx, rcx
       7402                 je       SHORT G_M000_IG05
       EB15                 jmp      SHORT G_M000_IG06
G_M000_IG05:               
       488BCE               mov      rcx, rsi
       48BAB80C584EFC7F0000 mov      rdx, 0x7FFC4E580CB8
       E87E36A95F           call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
       488BC8               mov      rcx, rax
G_M000_IG06:                
       FF159D932800         call     [Prog:Callee[System.__Canon]():bool]
       FFC7                 inc      edi
       83FF64               cmp      edi, 100
       7CD3                 jl       SHORT G_M000_IG03
       ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; loop end

       4883C430             add      rsp, 48
       5B                   pop      rbx
       5E                   pop      rsi
       5F                   pop      rdi
       C3                   ret      
; Total bytes of code 74

We were not able to hoist the runtime lookup stuff out of loop while this PR fixes that.
This is also required to do proper fix for @jkotas's suggestion in #81432 (comment)

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo

This comment was marked as outdated.

@runfoapp runfoapp bot mentioned this pull request Feb 6, 2023
@AndyAyersMS
Copy link
Member

Presumably, the dynamically expandable ones can't be marked as "pure"

Right, they are not pure.

@jakobbotsch
Copy link
Member

They might not be "pure" for a very strict sense of pure, but like the comment says, can they not be optimized as such from the JIT's perspective? I.e. they are idempotent.

@AndyAyersMS
Copy link
Member

Well, there are (were?) issues like #40298. I vaguely recall some other problem but haven't pinned it down yet.

@AndyAyersMS
Copy link
Member

Well, there are (were?) issues like #40298. I vaguely recall some other problem but haven't pinned it down yet.

Hmm. Maybe the other thing was just making sure we don't mark some of those indirs in the early expansions as invariant.

@jkotas
Copy link
Member

jkotas commented Feb 6, 2023

The dictionary lookup as a whole is pure. The helper call to populate the dictionary slot lazily is not pure (ie it has observable side-effects).

@EgorBo
Copy link
Member Author

EgorBo commented Feb 25, 2023

Fun fact about this PR - non-inlined runtime lookups lead to test timeouts. Initially I suspected some freezes/crashes but it turned out the code is just slow 😐

E.g. System.ObjectModel.Tests takes around a second to finish normally and 1 minutes without expanded runtime lookups

EgorBo and others added 3 commits March 14, 2023 15:27
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@EgorBo
Copy link
Member Author

EgorBo commented Mar 14, 2023

Anything else? New diffs - TP is even better than previously for Tier0, thanks @jakobbotsch

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few questions....

BasicBlockFlags originalFlags = block->bbFlags;
BasicBlock* prevBb = block;

if (stmt == block->firstStmt())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking for the case where gtSplitTree didn't do anything?

Perhaps deserves a comment to that effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment in a follow up if you don't mind to avoid re-running CI for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really about gtSplitTree? Isn't it here because we don't have a fgSplitBlockBefore?

gtSplitTree can make changes even without introducing new statements -- the return value needs to be used for that kind of check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's here just because I need basically fgSplitBlockBefore and only have fgSplitBlockAfter API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when it's not the first statement in the current block I do fgSplitBlockAfter (stmt->PrevStmt())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I should look at what gtSplitTree does, but I'm still confused exactly what this method is supposed to be doing. Maybe an example would help?

Copy link
Member Author

@EgorBo EgorBo Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I should look at what gtSplitTree does, but I'm still confused exactly what this method is supposed to be doing. Maybe an example would help?

The general idea that we want to split a block (e.g. BB0) into two (say, BBa and BBb) at a specific node (e.g. callX) and make sure that all side-effects are moved to BBa and the actual callX is now in BBb.

We have a phase where we insert GC safe points after specific call nodes - that one didn't have to care about any execution ordering since we just wanted to make sure GC is polled (it doesn't even matter whether to emit the poll before or after the calls). The runtime lookup case is a lot more complicated since we needed to insert verbose tree in front of the call and re-use its arguments, respect all kinds of complex COMMAs,


assert((lastIndOfTree != nullptr) && (pRuntimeLookup->indirections > 0));
impSpillSideEffects(true, CHECK_SPILL_ALL DEBUGARG("bubbling QMark0"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to deferring expansion of the rest of this in a manner similar to the case above? Seems like this is also creating a hoistable/cseable complex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch this path because it wasn't clear for me when it's hit. From what I see it's not used by R2R and NAOT. Perhaps some dynamic context

In a follow I'll see if it's worth supporting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testForFixup was used by fragile NGen. It is not used anymore. I think it is fine to delete it - both on the JIT/EE interface and the supporting code in the JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testForFixup was used by fragile NGen. It is not used anymore. I think it is fine to delete it - both on the JIT/EE interface and the supporting code in the JIT.

Thanks, will delete in #83430

@ghost ghost locked as resolved and limited conversation to collaborators Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Jit: move runtime lookup expansion to lower
6 participants