Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use *_NOCTOR jit helpers version once class is initialized #75785

Merged
merged 2 commits into from
Sep 17, 2022

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Sep 16, 2022

Today, we have *_NOCTOR helpers and default helpers for static class initialization. They are used to find the base address of various statics and thread statics. The *_NOCTOR versions are usually hoistable friendly because we know there is no static ctor to execute at any point and the static access can be safely moved around. However, for classes having static ctors, we use the default helpers (the one that doesn't end with _NOCTOR) and they do not get hoisted out of the loop.

A real world example of such code is https://github.com/microsoft/FASTER/blob/a3aafc8cae8298267ded2e884aea4377d69d0e77/cs/src/core/Epochs/LightEpoch.cs#L526-L536

Here, ReserveEntry() has bunch of access to thread statics inside a loop and is part of a class that has static ctor. We end up calling helper CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE (which is expensive) inside the loop since we do not hoist it outside the loop because of the possibility of point where static ctor should be executed.

Here is the simple repro:

class Foo {

  static Foo() {
     ...
  }

   [ThreadStatic]
   static ushort startOffset1;

    static int Test(int n) {
        int result = 0;
            for (int i = 0; i < n; i++) {
                result += startOffset1;
        }
        return result;
    }
G_M40168_IG03:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 9
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE
       movzx    rax, word  ptr [rax+28H]
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M40168_IG03

However, if we know that the class is already initialized, we can use the NOCTOR version of the helper in subsequent JITs and optimize the helper call out of the loop.

G_M40168_IG02:
       xor      edi, edi
       xor      ebx, ebx
       test     esi, esi
       jle      SHORT G_M40168_IG04
       mov      rcx, 0xD1FFAB1E
       mov      edx, 9
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE_NOCTOR
       movzx    rax, word  ptr [rax+28H]
                                                ;; size=32 bbWeight=1    PerfScore 5.25
G_M40168_IG03:
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M40168_IG03

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 16, 2022
@ghost ghost assigned kunalspathak Sep 16, 2022
@ghost
Copy link

ghost commented Sep 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Today, we have *_NOCTOR helpers and default helpers for static class initialization. They are used to find the base address of various statics and thread statics. The *_NOCTOR versions are usually hoistable friendly because we know there is no static ctor to execute at any point and the static access can be safely moved around. However, for classes having static ctors, we use the default helpers (the one that doesn't end with _NOCTOR) and they do not get hoisted out of the loop.

A real world example of such code is https://github.com/microsoft/FASTER/blob/a3aafc8cae8298267ded2e884aea4377d69d0e77/cs/src/core/Epochs/LightEpoch.cs#L526-L536

Here, ReserveEntry() has bunch of access to thread statics inside a loop and is part of a class that has static ctor. We end up calling helper CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE (which is expensive) inside the loop since we do not hoist it outside the loop because of the possibility of point where static ctor should be executed.

Here is the simple repro:

   [ThreadStatic]
   static ushort startOffset1;

    static int Test(int n) {
        int result = 0;
            for (int i = 0; i < n; i++) {
                result += startOffset1;
        }
        return result;
    }
G_M40168_IG03:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 9
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE
       movzx    rax, word  ptr [rax+28H]
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M40168_IG03

However, if we know that the class is already initialized, we can use the NOCTOR version of the helper in subsequent JITs and optimize the helper call out of the loop.

G_M40168_IG02:
       xor      edi, edi
       xor      ebx, ebx
       test     esi, esi
       jle      SHORT G_M40168_IG04
       mov      rcx, 0xD1FFAB1E
       mov      edx, 9
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE_NOCTOR
       movzx    rax, word  ptr [rax+28H]
                                                ;; size=32 bbWeight=1    PerfScore 5.25
G_M40168_IG03:
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M40168_IG03
Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak marked this pull request as ready for review September 17, 2022 00:57
@MichalPetryka
Copy link
Contributor

Could the CORINFO_HELP_GETSHARED_GCSTATIC_BASE check be relaxed to only affect managed structs too?

@jkotas
Copy link
Member

jkotas commented Sep 17, 2022

Could the CORINFO_HELP_GETSHARED_GCSTATIC_BASE check be relaxed to only affect managed structs too?

Could you please share an example of what you have in mind?

@MichalPetryka
Copy link
Contributor

Could the CORINFO_HELP_GETSHARED_GCSTATIC_BASE check be relaxed to only affect managed structs too?

Could you please share an example of what you have in mind?

I'm talking about this check, couldn't it check whether the type is managed, not just a struct?

pField->GetFieldType() == ELEMENT_TYPE_VALUETYPE)

@jkotas
Copy link
Member

jkotas commented Sep 17, 2022

I'm talking about this check, couldn't it check whether the type is managed, not just a struct?

It would mean redoing the whole static field layout algorithm in the type loader. It can be done, but it would be pretty non-trivial change and comes with trade-offs (like slower assembly and type loading).

@MichalPetryka
Copy link
Contributor

Ah, I wasn't sure whether the info is available at this point, if not then changing it here doesn't make sense.

@kunalspathak kunalspathak merged commit 66fd34e into dotnet:main Sep 17, 2022
@kunalspathak kunalspathak deleted the noctor branch September 17, 2022 16:28
@ghost ghost locked as resolved and limited conversation to collaborators Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants