Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the uses of ThreadStatic in class having static ctor #746

Merged
merged 5 commits into from
Sep 20, 2022

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Sep 16, 2022

In a class that has static ctor, accessing static fields include ThreadStatic is challenging. We have a helper call to find the base address of the variable to access and generally this helper call is not optimizable. Meaning, if we are accessing such variable in a loop, the corresponding helper call cannot be move around to optimize code. The reason being, we need to be precise at what time the ctor will be called and there are rules that govern that. Because of lack of ability to move the helper call around, if we access a static variable inside a loop, we will be calling the helper call in every single iteration. Find the base address is time consuming as well because it has to go through at least 3 looks up tables (module, thread and the variable).

However, in absence of static ctor, we can safely move around the helper call and also move it outside the loop in some cases. I noticed that ReserveEntry() has such code and the code generated is sub-optimal. We are fixing it in .NET in dotnet/runtime#75785, but it will be in .NET 8.

Till then, I was thinking of fixing it in this code base to not see those issues. For that, I have created a private class and moved most of the ThreadStatic variables inside it. Here is the code gen difference:

    public class X {

        [ThreadStatic]
        public static int startOffset1 = 1;
    }

    public unsafe class TLS {
     [MethodImpl(MethodImplOptions.NoInlining)]
     static TLS() {
    }

    [ThreadStatic]
    public static int startOffset1 = 1;

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test1(int n) {
        int result = 0;
            for (int i = 0; i < n; i++) {
                result += startOffset1;
        }
        return result;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Test2(int n) {
        int result = 0;
        for (int i = 0; i < n; i++) {
            result += X.startOffset1;
        }
        return result;
    }
}

Test1 code:

G_M64761_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      esi, ecx
                                                ;; size=9 bbWeight=1    PerfScore 3.50
G_M64761_IG02:
       xor      edi, edi
       xor      ebx, ebx
       test     esi, esi
       jle      SHORT G_M64761_IG04
                                                ;; size=8 bbWeight=1    PerfScore 1.75
G_M64761_IG03:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 9
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE
       movzx    rax, word  ptr [rax+28H]
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M64761_IG03
                                                ;; size=32 bbWeight=4    PerfScore 21.00
G_M64761_IG04:
       mov      eax, edi
                                                ;; size=2 bbWeight=1    PerfScore 0.25
G_M64761_IG05:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret

Test2 code:

G_M35802_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      esi, ecx
                                                ;; size=9 bbWeight=1    PerfScore 3.50
G_M35802_IG02:
       xor      edi, edi
       xor      ebx, ebx
       test     esi, esi
       jle      SHORT G_M35802_IG04
       mov      rcx, 0xD1FFAB1E
       mov      edx, 8
       call     CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE
       mov      eax, dword ptr [rax+24H]
                                                ;; size=31 bbWeight=1    PerfScore 5.25
G_M35802_IG03:
       add      edi, eax
       inc      ebx
       cmp      ebx, esi
       jl       SHORT G_M35802_IG03
                                                ;; size=8 bbWeight=4    PerfScore 7.00
G_M35802_IG04:
       mov      eax, edi
                                                ;; size=2 bbWeight=1    PerfScore 0.25
G_M35802_IG05:
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret

If you see here, in Test2, the helper call CORINFO_HELP_GETSHARED_NONGCTHREADSTATIC_BASE has been moved out of IG03 loop because it is being accessed using no ctor class X, but in Test1 we will call the helper inside loop.

@badrishc
Copy link
Contributor

Thanks, this is potentially a big cost saving. We are doing some perf benchmarks and will incorporate this PR when that is done.

@badrishc
Copy link
Contributor

Any chance you could resolve conflicts with main in the PR? We just merged other changes to LightEpoch (making CurrentEpoch a long, for example).

@kunalspathak
Copy link
Member Author

Any chance you could resolve conflicts with main in the PR? We just merged other changes to LightEpoch (making CurrentEpoch a long, for example).

Done.

We are doing some perf benchmarks and will incorporate this PR when that is done

Sure, will be curious to see if you see any improvements.

@badrishc
Copy link
Contributor

Update: As of now, we are only seeing a 2% gain in e2e perf with this fix. As it comes with a fair bit of code complexity, we are debating whether it's just best to wait until .NET 8 has a native fix for this.

@badrishc
Copy link
Contributor

A microbenchmark of just "Resume - Suspend" pairs indicates a 20% jump in performance of epoch protection. So we will merge this after the CI tests pass. Thanks for the contribution, and look forward to this being incorporated into .NET 8 soon!

@badrishc badrishc merged commit ea54b90 into microsoft:main Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants