Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Move RequiresAlign8 flag from RareFlags into ExtendedFlags #106010

Merged
merged 5 commits into from
Aug 7, 2024

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Aug 6, 2024

Implementation of idea from https://github.com/dotnet/runtime/pull/105931/files#r1704319780.

The benchmark code and results are below. I tested it on Raspberry Pi. The tests were run in succession, CPU temperature was around 48 degrees, so no thermal throttling should be involved.

Predictably, the performance improves for `RequiresAlign8` cases where we don't need to decode the optional fields block. There's about 5% perf hit for non-align8 array allocations and 2% perf hit for non-align8 value type allocations. Some perf hit for arrays was expected, the other code path needs more investigation though as no hit is expected there.

Updated benchmark results here: #106010 (comment)

This would make it easier to set the RequiresAlign8 flag for truncated method tables used in GCStaticEETypeNode.

Benchmark code:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System;
using System.Runtime;
using System.Runtime.CompilerServices;

public unsafe class Program
{
    [RuntimeImport("RhNewObject")]
    [MethodImpl(MethodImplOptions.InternalCall)]
    private static extern object RhNewObject(void* pMT);
    [RuntimeImport("RhNewArray")]
    [MethodImpl(MethodImplOptions.InternalCall)]
    private static extern object RhNewArray(void* pMT, int length);

    public static void Main()
    {
        var stopWatch = new System.Diagnostics.Stopwatch();

        for (int j = 0; j < 10; j++)
        {
            stopWatch.Restart();
            for (int i = 0; i < 1000000; i++)
                _ = RhNewArray((void*)typeof(int[]).TypeHandle.Value, 2);
            stopWatch.Stop();
            Console.WriteLine("NewIntArray: " + stopWatch.Elapsed);
        }

        for (int j = 0; j < 10; j++)
        {
            stopWatch.Restart();
            for (int i = 0; i < 1000000; i++)
                _ = RhNewArray((void*)typeof(double[]).TypeHandle.Value, 2);
            stopWatch.Stop();
            Console.WriteLine("NewDoubleArray: " + stopWatch.Elapsed);
        }

        for (int j = 0; j < 10; j++)
        {
            stopWatch.Restart();
            for (int i = 0; i < 1000000; i++)
                _ = RhNewObject((void*)typeof(ValueTuple<int, int>).TypeHandle.Value);
            stopWatch.Stop();
            Console.WriteLine("NewIntTuple: " + stopWatch.Elapsed);
        }

        for (int j = 0; j < 10; j++)
        {
            stopWatch.Restart();
            for (int i = 0; i < 1000000; i++)
                _ = RhNewObject((void*)typeof(ValueTuple<double, double>).TypeHandle.Value);
            stopWatch.Stop();
            Console.WriteLine("NewDoubleTuple: " + stopWatch.Elapsed);
        }
    }
}

namespace System.Runtime
{
    public class RuntimeImportAttribute : Attribute
    {
        public string DllName { get; }
        public string EntryPoint { get; }
        public RuntimeImportAttribute(string entry) => EntryPoint = entry;
    }
}

Benchmark results (on Raspberry Pi):

Main PR
NewIntArray: 00:00:00.0255011
NewIntArray: 00:00:00.0254985
NewIntArray: 00:00:00.0199294
NewIntArray: 00:00:00.0157785
NewIntArray: 00:00:00.0158397
NewIntArray: 00:00:00.0157611
NewIntArray: 00:00:00.0158823
NewIntArray: 00:00:00.0157390
NewIntArray: 00:00:00.0157586
NewIntArray: 00:00:00.0158148
NewDoubleArray: 00:00:00.0634320
NewDoubleArray: 00:00:00.0634249
NewDoubleArray: 00:00:00.0634018
NewDoubleArray: 00:00:00.0634123
NewDoubleArray: 00:00:00.0633429
NewDoubleArray: 00:00:00.0633701
NewDoubleArray: 00:00:00.0633932
NewDoubleArray: 00:00:00.0633275
NewDoubleArray: 00:00:00.0633318
NewDoubleArray: 00:00:00.0633482
NewIntTuple: 00:00:00.0142524
NewIntTuple: 00:00:00.0142566
NewIntTuple: 00:00:00.0142554
NewIntTuple: 00:00:00.0142424
NewIntTuple: 00:00:00.0142489
NewIntTuple: 00:00:00.0142521
NewIntTuple: 00:00:00.0142627
NewIntTuple: 00:00:00.0142432
NewIntTuple: 00:00:00.0142473
NewIntTuple: 00:00:00.0142491
NewDoubleTuple: 00:00:00.0269727
NewDoubleTuple: 00:00:00.0270101
NewDoubleTuple: 00:00:00.0269876
NewDoubleTuple: 00:00:00.0269796
NewDoubleTuple: 00:00:00.0269837
NewDoubleTuple: 00:00:00.0269709
NewDoubleTuple: 00:00:00.0270441
NewDoubleTuple: 00:00:00.0269772
NewDoubleTuple: 00:00:00.0269737
NewDoubleTuple: 00:00:00.0269802
NewIntArray: 00:00:00.0266514
NewIntArray: 00:00:00.0267529
NewIntArray: 00:00:00.0264482
NewIntArray: 00:00:00.0173164
NewIntArray: 00:00:00.0164856
NewIntArray: 00:00:00.0164668
NewIntArray: 00:00:00.0165303
NewIntArray: 00:00:00.0164860
NewIntArray: 00:00:00.0164870
NewIntArray: 00:00:00.0164756
NewDoubleArray: 00:00:00.0513089
NewDoubleArray: 00:00:00.0513532
NewDoubleArray: 00:00:00.0513423
NewDoubleArray: 00:00:00.0513160
NewDoubleArray: 00:00:00.0513266
NewDoubleArray: 00:00:00.0512910
NewDoubleArray: 00:00:00.0512942
NewDoubleArray: 00:00:00.0513139
NewDoubleArray: 00:00:00.0512965
NewDoubleArray: 00:00:00.0512951
NewIntTuple: 00:00:00.0144171
NewIntTuple: 00:00:00.0144348
NewIntTuple: 00:00:00.0144299
NewIntTuple: 00:00:00.0143846
NewIntTuple: 00:00:00.0144250
NewIntTuple: 00:00:00.0143741
NewIntTuple: 00:00:00.0143745
NewIntTuple: 00:00:00.0144070
NewIntTuple: 00:00:00.0144361
NewIntTuple: 00:00:00.0144255
NewDoubleTuple: 00:00:00.0162535
NewDoubleTuple: 00:00:00.0162460
NewDoubleTuple: 00:00:00.0162538
NewDoubleTuple: 00:00:00.0162349
NewDoubleTuple: 00:00:00.0162483
NewDoubleTuple: 00:00:00.0162368
NewDoubleTuple: 00:00:00.0162206
NewDoubleTuple: 00:00:00.0162730
NewDoubleTuple: 00:00:00.0163367
NewDoubleTuple: 00:00:00.0162350

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 6, 2024
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@filipnavara filipnavara closed this Aug 6, 2024
@filipnavara
Copy link
Member Author

Closing for now, the benchmark is flawed.

@filipnavara
Copy link
Member Author

filipnavara commented Aug 6, 2024

Updated the OP with a fixed benchmark and more meaningful numbers. I'll will tinker a bit more with it.

@filipnavara
Copy link
Member Author

I rewrote the RequiresAlign8 getter by exploiting a bit trick used in HasComponentSize. Now it beats the old version in the benchmarks in all cases.

Benchmark results (on Raspberry Pi):

Main PR
NewIntArray: 00:00:00.0253658
NewIntArray: 00:00:00.0255230
NewIntArray: 00:00:00.0251697
NewIntArray: 00:00:00.0221124
NewIntArray: 00:00:00.0156506
NewIntArray: 00:00:00.0157653
NewIntArray: 00:00:00.0157124
NewIntArray: 00:00:00.0157753
NewIntArray: 00:00:00.0157293
NewIntArray: 00:00:00.0156424
NewDoubleArray: 00:00:00.0631696
NewDoubleArray: 00:00:00.0631607
NewDoubleArray: 00:00:00.0631175
NewDoubleArray: 00:00:00.0631376
NewDoubleArray: 00:00:00.0632945
NewDoubleArray: 00:00:00.0631229
NewDoubleArray: 00:00:00.0631233
NewDoubleArray: 00:00:00.0630915
NewDoubleArray: 00:00:00.0631253
NewDoubleArray: 00:00:00.0631230
NewIntTuple: 00:00:00.0142051
NewIntTuple: 00:00:00.0142171
NewIntTuple: 00:00:00.0142024
NewIntTuple: 00:00:00.0141806
NewIntTuple: 00:00:00.0142125
NewIntTuple: 00:00:00.0142099
NewIntTuple: 00:00:00.0141944
NewIntTuple: 00:00:00.0141997
NewIntTuple: 00:00:00.0142020
NewIntTuple: 00:00:00.0141909
NewDoubleTuple: 00:00:00.0269583
NewDoubleTuple: 00:00:00.0269372
NewDoubleTuple: 00:00:00.0269473
NewDoubleTuple: 00:00:00.0269310
NewDoubleTuple: 00:00:00.0269361
NewDoubleTuple: 00:00:00.0269333
NewDoubleTuple: 00:00:00.0269077
NewDoubleTuple: 00:00:00.0269356
NewDoubleTuple: 00:00:00.0269354
NewDoubleTuple: 00:00:00.0269354
NewIntArray: 00:00:00.0230733
NewIntArray: 00:00:00.0237831
NewIntArray: 00:00:00.0234446
NewIntArray: 00:00:00.0232283
NewIntArray: 00:00:00.0143300
NewIntArray: 00:00:00.0143772
NewIntArray: 00:00:00.0143603
NewIntArray: 00:00:00.0143911
NewIntArray: 00:00:00.0143334
NewIntArray: 00:00:00.0144492
NewDoubleArray: 00:00:00.0508229
NewDoubleArray: 00:00:00.0509037
NewDoubleArray: 00:00:00.0510827
NewDoubleArray: 00:00:00.0505703
NewDoubleArray: 00:00:00.0506560
NewDoubleArray: 00:00:00.0505697
NewDoubleArray: 00:00:00.0507030
NewDoubleArray: 00:00:00.0506166
NewDoubleArray: 00:00:00.0505482
NewDoubleArray: 00:00:00.0505669
NewIntTuple: 00:00:00.0138182
NewIntTuple: 00:00:00.0138272
NewIntTuple: 00:00:00.0137903
NewIntTuple: 00:00:00.0138285
NewIntTuple: 00:00:00.0138366
NewIntTuple: 00:00:00.0138224
NewIntTuple: 00:00:00.0137791
NewIntTuple: 00:00:00.0138458
NewIntTuple: 00:00:00.0138181
NewIntTuple: 00:00:00.0138305
NewDoubleTuple: 00:00:00.0154690
NewDoubleTuple: 00:00:00.0155391
NewDoubleTuple: 00:00:00.0154707
NewDoubleTuple: 00:00:00.0155383
NewDoubleTuple: 00:00:00.0154718
NewDoubleTuple: 00:00:00.0155317
NewDoubleTuple: 00:00:00.0154563
NewDoubleTuple: 00:00:00.0155298
NewDoubleTuple: 00:00:00.0154692
NewDoubleTuple: 00:00:00.0155386

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

How do we store RequiresAlign8 for arrays of value types with this change? If I am reading the code correctly, RequiresAlign8 is always going to be false for arrays with this change.

@filipnavara
Copy link
Member Author

How do we store RequiresAlign8 for arrays of value types with this change?

We don’t. We just check the element type of the array for IsValueType && RequiresAlign8. The array allocation path is separate so the cost of the check is moved there. I meant to add Debug.Assert(!IsArray) into the getter but forgot about it.

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

If the getter is not supposed to be called on arrays, we can omit the HasComponentSizeFlag tricks from it and just test the single bit.

@filipnavara
Copy link
Member Author

The HasComponentSizeFlag check is still needed for string.

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

I would expect that string allocations to be always special cased. Where is the RequiresAlign8 checked for string allocations?

@filipnavara
Copy link
Member Author

filipnavara commented Aug 6, 2024

I would expect that string allocations to be always special cased. Where is the RequiresAlign8 checked for string allocations?

Presumably you are right. I checked all the code paths and couldn’t find one that would be hit with string. Moreover, there’s already an assert in RhNewObject to disallow strings. I’ll simplify the check.

@filipnavara
Copy link
Member Author

This should definitely run through the NativeAOT outerloop tests if the PR looks reasonable in other aspects.

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jkotas
Copy link
Member

jkotas commented Aug 7, 2024

A lot of tests on linux-arm crashing with stackoverflow:

ld_linux_armhf_so!_tls_get_addr+0x5
baseservices!S_P_CoreLib_System_Threading_Lock::EnterAndGetCurrentThreadId+0x16 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs @ 83] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner_Cctor::GetCctor+0x2e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 279] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::EnsureClassConstructorRun+0x2a [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 66] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::CheckStaticClassConstructionReturnGCStaticBase+0x10 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 36] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::AllocateThreadStaticStorageForType+0x7e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 140] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::GetUninlinedThreadStaticBaseForTypeSlow+0x146 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 110] 
baseservices!S_P_CoreLib_System_Threading_Lock::EnterAndGetCurrentThreadId+0x16 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs @ 83] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner_Cctor::GetCctor+0x2e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 279] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::EnsureClassConstructorRun+0x2a [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 66] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::CheckStaticClassConstructionReturnGCStaticBase+0x10 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 36] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::AllocateThreadStaticStorageForType+0x7e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 140] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::GetUninlinedThreadStaticBaseForTypeSlow+0x146 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 110] 
baseservices!S_P_CoreLib_System_Threading_Lock::EnterAndGetCurrentThreadId+0x16 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs @ 83] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner_Cctor::GetCctor+0x2e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 279] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::EnsureClassConstructorRun+0x2a [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 66] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::CheckStaticClassConstructionReturnGCStaticBase+0x10 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 36] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::AllocateThreadStaticStorageForType+0x7e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 140] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::GetUninlinedThreadStaticBaseForTypeSlow+0x146 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 110] 
baseservices!S_P_CoreLib_System_Threading_Lock::EnterAndGetCurrentThreadId+0x16 [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Lock.cs @ 83] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner_Cctor::GetCctor+0x2e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 279] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::EnsureClassConstructorRun+0x2a [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 66] 
baseservices!S_P_CoreLib_System_Runtime_CompilerServices_ClassConstructorRunner::CheckStaticClassConstructionReturnGCStaticBase+0x10 [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/ClassConstructorRunner.cs @ 36] 
baseservices!S_P_CoreLib_Internal_Runtime_ThreadStatics::AllocateThreadStaticStorageForType+0x7e [/_/src/coreclr/nativeaot/System.Private.CoreLib/src/Internal/Runtime/ThreadStatics.cs @ 140] 
...

@filipnavara
Copy link
Member Author

A lot of tests on linux-arm crashing with stackoverflow:

It was a very stupid mistake in the last change.

@MichalStrehovsky
Copy link
Member

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants