Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move unboxing helpers to managed code #109135

Merged
merged 19 commits into from
Nov 21, 2024

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Oct 23, 2024

Move the unboxing helpers to managed code.

Behavior is basically identical except for the Unbox_Nullable paths, which required some investigation to find the fastest implementation. Notably, there interruptibility of managed code makes the copying/zeroing of values more difficult, but with the opportunity/requirement to specialize the codebase came a few micro-optimizations that are somewhat nice. Overall I don't expect anyone to notice the performance changes here, but since my earlier code was about 2X slower than the native implementation, I did feel the need to optimize until everything looked good.

Performance results:

TestName With PR Without PR % Speedup
TestJustAPrimitive 1217.0047 1221.203 0.34%
TestJustObject 1212.6415 1211.6437 -0.08%
TestJust5Primitive 1496.5304 1522.4968 1.74%
TestJust5Object 1461.0507 1488.3328 1.87%
TestJust10Primitive 1473.2814 1493.5238 1.37%
TestJust10Object 3215.6339 2854.6186 -11.23%
TestJustAPrimitiveNullableWithValue 2727.9085 5182.2611 89.97%
TestJustObjectNullableWithValue 3148.9484 5672.2985 80.13%
TestJust5PrimitiveNullableWithValue 5443.9232 7795.6109 43.20%
TestJust5ObjectNullableWithValue 6492.9071 8095.1508 24.68%
TestJust10PrimitiveNullableWithValue 6022.6274 8723.572 44.85%
TestJust10ObjectNullableWithValue 7728.3239 9671.1382 25.14%
TestJustAPrimitiveNullNullable 1786.1337 2230.0932 24.86%
TestJustObjectNullNullable 1675.0683 2326.0395 38.86%
TestJust5PrimitiveNullNullable 2921.9497 3298.4642 12.89%
TestJust5ObjectNullNullable 3389.4043 3615.3131 6.67%
TestJust10PrimitiveNullNullable 3050.809 4054.9683 32.91%
TestJust10ObjectNullNullable 4658.8316 5335.0686 14.52%

Results are very positive, or within the margin of error in this test suite. These results were generated using a small benchmark which mostly targeted measuring the performance of the Unbox_Nullable helper, as it has the most complex and potentially slow code. Generally the impact on that helper is that the performance of the type system portion of the helper is faster, and the performance of code which actually copies the contents of a valuetype is a little better. This isn't quite a fair test of managed vs native performance though, as I took the opportunity to restructure some of the memory on MethodTable so that it could more easily be read in managed code, and that happened to make a fair bit of complex code become simpler and thus faster.

Benchmark code (standalone console app)
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace UnboxingPerfTests
{
    internal class Program
    {
        public interface IStaticMethod<T>
        {
            public static abstract void Method(ref T param);
        }
        // All tests use GenericStruct<T, object> to force us into canonical code gen to always use the helpers
        struct GenericStruct<T, V> : IStaticMethod<GenericStruct<T, V>> where T: IStaticMethod<T>
        {
            public T Value;

            [MethodImpl(MethodImplOptions.NoInlining)]
            public static void Method(ref GenericStruct<T, V> value)
            {
                T.Method(ref value.Value);
            }
        }

        struct JustAPrimitive : IStaticMethod<JustAPrimitive>
        {
            public int Value;
            public static void Method(ref JustAPrimitive param)
            {
            }
        }

        struct JustObject : IStaticMethod<JustObject>
        {
            public object Value;
            public static void Method(ref JustObject param)
            {
            }
        }

        struct Just10Primitive : IStaticMethod<Just10Primitive>
        {
            public int Value;
            public int Value2;
            public int Value3;
            public int Value4;
            public int Value5;
            public int Value6;
            public int Value7;
            public int Value8;
            public int Value9;
            public int Value10;
            public static void Method(ref Just10Primitive param)
            {
            }
        }

        struct Just10Object : IStaticMethod<Just10Object>
        {
            public object Value;
            public object Value2;
            public object Value3;
            public object Value4;
            public object Value5;
            public object Value6;
            public object Value7;
            public object Value8;
            public object Value9;
            public object Value10;

            public static void Method(ref Just10Object param)
            {
            }
        }

        struct Just5Primitive : IStaticMethod<Just5Primitive>
        {
            public int Value;
            public int Value2;
            public int Value3;
            public int Value4;
            public int Value5;
            public static void Method(ref Just5Primitive param)
            {
            }
        }

        struct Just5Object : IStaticMethod<Just5Object>
        {
            public object Value;
            public object Value2;
            public object Value3;
            public object Value4;
            public object Value5;

            public static void Method(ref Just5Object param)
            {
            }
        }


        public static void Unbox_any_Test<T>(object[] o) where T:IStaticMethod<T>
        {
            for (int i = 0; i < 1_000_000; i++)
            {
                T local = (T)o[i%3];
                T.Method(ref local);
            }
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int IsInst_Test<T>(object?[] o)
        {
            int retVal = 0;
            for (int i = 0; i < 1_000_000; i++)
            {
                if (o[i % 3] is T)
                    retVal++;
            }
            return retVal;
        }

        public static void Unbox_any_TestNullable<T>(object?[] o) where T : struct, IStaticMethod<T>
        {
            for (int i = 0; i < 1_000_000; i++)
            {
                T? local = (T?)o[i % 3];
                if (local.HasValue)
                {
                    T localCopy = local.Value;
                    T.Method(ref localCopy);
                }
            }
        }


        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        public static void Test(int outerIterationCount, Action a, string testName, Action<string> outputFunc)
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < outerIterationCount; i++)
            {
                a();
            }
            stopwatch.Stop();
            outputFunc($"| {testName,38}| {stopwatch.Elapsed.TotalMilliseconds.ToString()}|");
        }

        static void TestJustAPrimitive()
        {
            object[] arr = { new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>() };
            Unbox_any_Test<GenericStruct<JustAPrimitive, object>>(arr);
        }

        static void TestJustObject()
        {
            object[] arr = { new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>() };
            Unbox_any_Test<GenericStruct<JustObject, object>>(arr);
        }

        static void TestJust10Primitive()
        {
            object[] arr = { new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>() };
            Unbox_any_Test<GenericStruct<Just10Primitive, object>>(arr);
        }

        static void TestJust10Object()
        {
            object[] arr = { new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>() };
            Unbox_any_Test<GenericStruct<Just10Object, object>>(arr);
        }

        static void TestJust5Primitive()
        {
            object[] arr = { new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>() };
            Unbox_any_Test<GenericStruct<Just5Primitive, object>>(arr);
        }

        static void TestJust5Object()
        {
            object[] arr = { new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>() };
            Unbox_any_Test<GenericStruct<Just5Object, object>>(arr);
        }


        static void TestJustAPrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>() };
            Unbox_any_TestNullable<GenericStruct<JustAPrimitive, object>>(arr);
        }

        static void TestJustObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>() };
            Unbox_any_TestNullable<GenericStruct<JustObject, object>>(arr);
        }

        static void TestJust10PrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>() };
            Unbox_any_TestNullable<GenericStruct<Just10Primitive, object>>(arr);
        }

        static void TestJust10ObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>() };
            Unbox_any_TestNullable<GenericStruct<Just10Object, object>>(arr);
        }

        static void TestJust5PrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>() };
            Unbox_any_TestNullable<GenericStruct<Just5Primitive, object>>(arr);
        }

        static void TestJust5ObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>() };
            Unbox_any_TestNullable<GenericStruct<Just5Object, object>>(arr);
        }


        static void TestJustAPrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<JustAPrimitive, object>>(arr);
        }

        static void TestJustObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<JustObject, object>>(arr);
        }

        static void TestJust10PrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<Just10Primitive, object>>(arr);
        }

        static void TestJust10ObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<Just10Object, object>>(arr);
        }

        static void TestJust5PrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<Just5Primitive, object>>(arr);
        }

        static void TestJust5ObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            Unbox_any_TestNullable<GenericStruct<Just5Object, object>>(arr);
        }

        static void IsInstTestJustAPrimitive()
        {
            object[] arr = { new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>() };
            IsInst_Test<GenericStruct<JustAPrimitive, object>>(arr);
        }

        static void IsInstTestJustObject()
        {
            object[] arr = { new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>() };
            IsInst_Test<GenericStruct<JustObject, object>>(arr);
        }

        static void IsInstTestJust10Primitive()
        {
            object[] arr = { new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>() };
            IsInst_Test<GenericStruct<Just10Primitive, object>>(arr);
        }

        static void IsInstTestJust10Object()
        {
            object[] arr = { new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>() };
            IsInst_Test<GenericStruct<Just10Object, object>>(arr);
        }

        static void IsInstTestJust5Primitive()
        {
            object[] arr = { new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>() };
            IsInst_Test<GenericStruct<Just5Primitive, object>>(arr);
        }

        static void IsInstTestJust5Object()
        {
            object[] arr = { new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>() };
            IsInst_Test<GenericStruct<Just5Object, object>>(arr);
        }


        static void IsInstTestJustAPrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>(), new GenericStruct<JustAPrimitive, object>() };
            IsInst_Test<GenericStruct<JustAPrimitive, object>>(arr);
        }

        static void IsInstTestJustObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>(), new GenericStruct<JustObject, object>() };
            IsInst_Test<GenericStruct<JustObject, object>>(arr);
        }

        static void IsInstTestJust10PrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>(), new GenericStruct<Just10Primitive, object>() };
            IsInst_Test<GenericStruct<Just10Primitive, object>?>(arr);
        }

        static void IsInstTestJust10ObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>(), new GenericStruct<Just10Object, object>() };
            IsInst_Test<GenericStruct<Just10Object, object>?>(arr);
        }

        static void IsInstTestJust5PrimitiveNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>(), new GenericStruct<Just5Primitive, object>() };
            IsInst_Test<GenericStruct<Just5Primitive, object>?>(arr);
        }

        static void IsInstTestJust5ObjectNullableWithValue()
        {
            object[] arr = { new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>(), new GenericStruct<Just5Object, object>() };
            IsInst_Test<GenericStruct<Just5Object, object>?>(arr);
        }


        static void IsInstTestJustAPrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<JustAPrimitive, object>?>(arr);
        }

        static void IsInstTestJustObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<JustObject, object>?>(arr);
        }

        static void IsInstTestJust10PrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<Just10Primitive, object>?>(arr);
        }

        static void IsInstTestJust10ObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<Just10Object, object>?>(arr);
        }

        static void IsInstTestJust5PrimitiveNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<Just5Primitive, object>?>(arr);
        }

        static void IsInstTestJust5ObjectNullNullable()
        {
            object?[] arr = { null, null, null };
            IsInst_Test<GenericStruct<Just5Object, object>?>(arr);
        }

        enum Foo : int
        {
            Value = 3
        }

        enum Bar : int
        {

        }

        enum Other : int
        {
            Value = 4,
        }

        public static object[] TheArrayForEnumTests = { (int)4, Foo.Value, Other.Value };
        static void IsInstTestEnumVariance()
        {
            int count = 0;
            object[] arr = TheArrayForEnumTests;

            for (int i = 0; i < 1_000_000; i++)
            {
                Bar local = (Bar)arr[i % 3];
                count += (int)local;
            }

            s_x = count;
        }
        public static int s_x;
        static void IsInstTestEnumVarianceNullable()
        {
            int count = 0;
            object?[] arr = TheArrayForEnumTests;

            for (int i = 0; i < 1_000_000; i++)
            {
                Bar? local = (Bar?)arr[i % 3];
                if (local.HasValue)
                {
                    count += (int)local.Value;
                }
            }
            s_x = count;
        }

        static void AllTests(int iterationCount, Action<string> outputFunc)
        {
/*            Test(iterationCount, IsInstTestJustAPrimitive, nameof(IsInstTestJustAPrimitive), outputFunc);
            Test(iterationCount, IsInstTestJustObject, nameof(IsInstTestJustObject), outputFunc);
            Test(iterationCount, IsInstTestJust5Primitive, nameof(IsInstTestJust5Primitive), outputFunc);
            Test(iterationCount, IsInstTestJust5Object, nameof(IsInstTestJust5Object), outputFunc);
            Test(iterationCount, IsInstTestJust10Primitive, nameof(IsInstTestJust10Primitive), outputFunc);
            Test(iterationCount, IsInstTestJust10Object, nameof(IsInstTestJust10Object), outputFunc);

            Test(iterationCount, IsInstTestJustAPrimitiveNullableWithValue, nameof(IsInstTestJustAPrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, IsInstTestJustObjectNullableWithValue, nameof(IsInstTestJustObjectNullableWithValue), outputFunc);*/
/*            Test(iterationCount, IsInstTestJust5PrimitiveNullableWithValue, nameof(IsInstTestJust5PrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, IsInstTestJust5ObjectNullableWithValue, nameof(IsInstTestJust5ObjectNullableWithValue), outputFunc);
            Test(iterationCount, IsInstTestJust10PrimitiveNullableWithValue, nameof(IsInstTestJust10PrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, IsInstTestJust10ObjectNullableWithValue, nameof(IsInstTestJust10ObjectNullableWithValue), outputFunc);

            Test(iterationCount, IsInstTestJustAPrimitiveNullNullable, nameof(IsInstTestJustAPrimitiveNullNullable), outputFunc);
            Test(iterationCount, IsInstTestJustObjectNullNullable, nameof(IsInstTestJustObjectNullNullable), outputFunc);
            Test(iterationCount, IsInstTestJust5PrimitiveNullNullable, nameof(IsInstTestJust5PrimitiveNullNullable), outputFunc);
            Test(iterationCount, IsInstTestJust5ObjectNullNullable, nameof(IsInstTestJust5ObjectNullNullable), outputFunc);
            Test(iterationCount, IsInstTestJust10PrimitiveNullNullable, nameof(IsInstTestJust10PrimitiveNullNullable), outputFunc);
            Test(iterationCount, IsInstTestJust10ObjectNullNullable, nameof(IsInstTestJust10ObjectNullNullable), outputFunc);

            Test(iterationCount, IsInstTestEnumVariance, nameof(IsInstTestEnumVariance), outputFunc);
//            Test(iterationCount, IsInstTestEnumVarianceNullable, nameof(IsInstTestEnumVarianceNullable), outputFunc);
*/

            Test(iterationCount, TestJustAPrimitive, nameof(TestJustAPrimitive), outputFunc);
            Test(iterationCount, TestJustObject, nameof(TestJustObject), outputFunc);
            Test(iterationCount, TestJust5Primitive, nameof(TestJust5Primitive), outputFunc);
            Test(iterationCount, TestJust5Object, nameof(TestJust5Object), outputFunc);
            Test(iterationCount, TestJust10Primitive, nameof(TestJust10Primitive), outputFunc);
            Test(iterationCount, TestJust10Object, nameof(TestJust10Object), outputFunc);

            Test(iterationCount, TestJustAPrimitiveNullableWithValue, nameof(TestJustAPrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, TestJustObjectNullableWithValue, nameof(TestJustObjectNullableWithValue), outputFunc);
            Test(iterationCount, TestJust5PrimitiveNullableWithValue, nameof(TestJust5PrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, TestJust5ObjectNullableWithValue, nameof(TestJust5ObjectNullableWithValue), outputFunc);
            Test(iterationCount, TestJust10PrimitiveNullableWithValue, nameof(TestJust10PrimitiveNullableWithValue), outputFunc);
            Test(iterationCount, TestJust10ObjectNullableWithValue, nameof(TestJust10ObjectNullableWithValue), outputFunc);

            Test(iterationCount, TestJustAPrimitiveNullNullable, nameof(TestJustAPrimitiveNullNullable), outputFunc);
            Test(iterationCount, TestJustObjectNullNullable, nameof(TestJustObjectNullNullable), outputFunc);
            Test(iterationCount, TestJust5PrimitiveNullNullable, nameof(TestJust5PrimitiveNullNullable), outputFunc);
            Test(iterationCount, TestJust5ObjectNullNullable, nameof(TestJust5ObjectNullNullable), outputFunc);
            Test(iterationCount, TestJust10PrimitiveNullNullable, nameof(TestJust10PrimitiveNullNullable), outputFunc);
            Test(iterationCount, TestJust10ObjectNullNullable, nameof(TestJust10ObjectNullNullable), outputFunc);
        }

        static void PrintNothing(string str) { }
        static void PrintLine(string str) { Console.Write(str); Console.Write("\n"); }

        static void Main(string[] args)
        {
            for (int i = 0; i < 100; i++)
            {
                AllTests(1, PrintNothing);
                Console.Write("");
            }

            Thread.Sleep(200);

            AllTests(500, PrintLine);
        }
    }
}

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

}

// Set the hasValue field on the Nullable type. It MUST always be placed at the start of the object.
*(bool*)destPtr = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct. The cast is casting the value of the reference into pointer. Unsafe.As would be correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems wrong. I kept waffling between refs and pointers here. Thanks.

{
[StackTraceHidden]
[DebuggerStepThrough]
internal static unsafe partial class BoxingHelpers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these types are going to be loaded on startup path. It feels a bit too fine grained to have separate type for the few unboxing helpers.

Would it be better for the unboxing helpers to live in CastHelpers? They are coupled anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, seems reasonable to me. I had thought I might be moving the allocation based helpers over too so this class wouldn't be so small, but honestly, just copying the assembly routines from NativeAOT seems like a better path for most of those.

// Set the hasValue field on the Nullable type. It MUST always be placed at the start of the object.
Unsafe.As<byte, bool>(ref destPtr) = true;
ref byte destValuePtr = ref typeMT->GetNullableValueFieldReferenceAndSize(ref destPtr, out uint size);
Unsafe.CopyBlockUnaligned(ref destValuePtr, ref RuntimeHelpers.GetRawData(obj), size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who does InitValueClass above need to be concerned with ref vs. non-ref differences, but this place does not need to be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I haven't yet gotten around to fixing this problem... and am in the process of verifying if the alignment handling is necessary or not.

Comment on lines 590 to 599
MethodTable* pMT2 = RuntimeHelpers.GetMethodTable(obj);
if ((pMT1->IsPrimitive && pMT2->IsPrimitive &&
pMT1->GetPrimitiveCorElementType() == pMT2->GetPrimitiveCorElementType()) ||
AreTypesEquivalent(pMT1, pMT2))
{
return ref RuntimeHelpers.GetRawData(obj);
}

CastHelpers.ThrowInvalidCastException(obj, pMT1);
return ref Unsafe.AsRef<byte>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invert the if here to remove the unreachable return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do this, I expect that ThrowInvalidCastException would have to throw in C# to make the JIT realize that it is cold code and produce the desired code layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, only now I've noticed that it does a QCall instead of direct throw. What's the point of the runtime call here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And there are resource file changes that I would need to do to get the right string. I'll do it if you want @jkotas, but honestly this isn't so bad as it is. I couldn't find any impact on performance from the generated assembly code.

This comment was marked as resolved.

Comment on lines 506 to 507
uint numInstanceFieldBytes = pMT->GetNumInstanceFieldBytes();
if ((((uint)Unsafe.AsPointer(ref destBytes) | numInstanceFieldBytes) & ((uint)sizeof(void*) - 1)) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is more or less efficient to check pMT->ContainsGCPointers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I'm not quite ready to benchmark this, and need to validate that we actually need to do aligned work here at all. I see 3 basic implementations.

  1. This bifurcated approach, with some sort of if check on safety of using the general purpose clearing routine. That sort of thing is safe for cases where the destination may be on the heap or not. Using ContainsGCPointers wouldn't have been safe for the C++ implementation of all of this, but since this implementation of InitValueClass is actually only used for Nullable<T> instances, we CAN use that flag. I need to some performance investigation to determine the actual fastest approach here.
  2. An approach where we p/invoke to 'C' memset as an FCall. That is only safe if the destination is guaranteed to be on the current stack. This needs to be an FCall since otherwise the specification of memset permits non-pointer atomic stores, and an inopportune GC suspension could see an invalid pointer.
  3. A custom set routine which does a warmup series of sets to individual bytes until the region is aligned, then sets pointer sized chunks using atomic stores. (The CPU notion of atomic, not the C++ notion), and then has a warmdown phase. This is safe for all cases as well, and is what the old C++ implementation used to do.

A similar set of choices exists for the actual copying routine below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, my analysis of the usage of Unbox_Nullable is that usage for the JIT scenario always works with a stack based destination, but there is usage from Array.CoreCLR.cs which can unbox to the heap. I'm building a small suite of perf tests to see what the performance of various options in this space looks like.

Comment on lines 590 to 599
MethodTable* pMT2 = RuntimeHelpers.GetMethodTable(obj);
if ((pMT1->IsPrimitive && pMT2->IsPrimitive &&
pMT1->GetPrimitiveCorElementType() == pMT2->GetPrimitiveCorElementType()) ||
AreTypesEquivalent(pMT1, pMT2))
{
return ref RuntimeHelpers.GetRawData(obj);
}

CastHelpers.ThrowInvalidCastException(obj, pMT1);
return ref Unsafe.AsRef<byte>(null);

This comment was marked as resolved.

- Fix assert that was always firing
- Tweak code per code review to make the code generator aware that ThrowInvalidCastException is going to throw and should always be in a cold path
- Improve the implementation of MethodTable.IsPrimitive. I noticed that this could be a single and+compare instead of the pair it was before. This improves the performance of Unbox_Helper slightly by about 5%.
@davidwrighton davidwrighton marked this pull request as ready for review October 31, 2024 21:18

[DebuggerHidden]
[MethodImpl(MethodImplOptions.NoInlining)]
internal static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2)
private static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2)


[DebuggerHidden]
[MethodImpl(MethodImplOptions.NoInlining)]
internal static ref byte Unbox_Helper(MethodTable* pMT1, object obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal static ref byte Unbox_Helper(MethodTable* pMT1, object obj)
private static ref byte Unbox_Helper(MethodTable* pMT1, object obj)

else
{
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass
nuint numInstanceFieldBytes = typeMT->BaseSize - (nuint)(2 * sizeof(IntPtr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numInstanceFieldBytes local var is unusued. Is this unfinished refactoring?

It may be nice to move this to a property or method on MethodTable. There are more places where this micro-optimization can be used. The property impl can assert that this is only used on when ContainsGCPointers is true and that it returns the same value as full GetNumInstanceFieldBytes.

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Unbox_Nullable_NotIsNullableForType(ref byte destPtr, MethodTable* typeMT, object obj)
{
// For safety's sake, also allow true nullables to be unboxed normally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I know that this is pre-existing comment.) This is not for safety's sake. We would not give up any safety if we threw for boxed Nullable<T> that should not exist here. It is just to hide bugs. Have you tried deleting this to see whether anything fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some cases in func eval which claim to use this pathway. In addition, it IS a pathway that has been used in reflection in the past, so I don't want to remove this path.

}
else
{
UnboxNullableValue(ref destPtr, typeMT, obj);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason why we are not just calling BulkMoveWithWriteBarrier/Memmove here to copy the value? I would expect it to be faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that there is a fair amount of type system structure that would need to be ported to managed code to call BulkMoveWithWriteBarrier. I made the decision to pull this out into its own FCALL here. Especially since I expect that most paths will need to do a BulkMoveWithWriteBarrier which isn't that different in perf. I can explore pulling enough type system structure for this if you'd like me to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, now that I look at this, I realized that I can actually encode the needed bits for this particular load into the space used by m_pInterfaceMap which would allow all of this to avoid a couple of memory loads, branches etc. All of that should allow me to optimize the rest of all of this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure that I understand - is the main missing piece Nullable::ValueAddr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its ValueAddr + the computation of the size of the instance field bytes of the Value field. That requires access to the EEClass + access to a fielddesc, etc. It boils down to not that many instructions, but it IS a fair number of concepts and type system structures. That is a task probably worth doing at some point, but I'd like to avoid doing it now. The new encoding of data, which is redundant with existing data, allows for extremely cheap access, and is simple to encode into the managed code without adding more concepts than MethodTable.

}
else
{
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public uint GetNumInstanceFieldBytesIfContainsGCPointers()
{
Debug.Assert(ContainsGCPointers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Debug.Assert(ContainsGCPointers);
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass
Debug.Assert(ContainsGCPointers);

@@ -706,12 +703,41 @@ internal unsafe struct MethodTable
[FieldOffset(ElementTypeOffset)]
public void* ElementType;

/// <summary>
/// The PerInstInfo is used to describe the generic arguments and dictionary of this type.
/// It points as a PerInstInfo, which is an array of pointers to generic dictionaries, which then point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It points as a PerInstInfo

This does not parse for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the wording. Hopefully it is a bit more clear although for sure, the actual design of this is a very confusing structure.

if (pMT->IsValueType())
{
DWORD baseSizePadding = pMT->GetClass()->GetBaseSizePadding();
_ASSERTE(baseSizePadding == (sizeof(TADDR) * 2)); // This is dependended on by the System.Runtime.CompilerServices.CastHelpers.IsNullableForType code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the dependency in IsNullableForType. Obsolete comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment to reference GetNumInstanceFieldBytesIfContainsGCPointers

@davidwrighton davidwrighton merged commit eb456e6 into dotnet:main Nov 21, 2024
90 checks passed
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
Move the unboxing helpers to managed code.

Behavior is basically identical except for the Unbox_Nullable paths, which required some investigation to find the fastest implementation. Notably, there interruptibility of managed code makes the copying/zeroing of values more difficult, but with the opportunity/requirement to specialize the codebase came a few micro-optimizations that are somewhat nice. Overall I don't expect anyone to notice the performance changes here, but since my earlier code was about 2X slower than the native implementation, I did feel the need to optimize until everything looked good.

Performance results:

| TestName | With PR | Without PR | % Speedup |
| --- | --- | --- | --- |
|                     TestJustAPrimitive| 1217.0047| 1221.203| 0.34%   |
|                         TestJustObject| 1212.6415| 1211.6437|-0.08%  |
|                     TestJust5Primitive| 1496.5304| 1522.4968|1.74%   |
|                        TestJust5Object| 1461.0507| 1488.3328|1.87%   |
|                    TestJust10Primitive| 1473.2814| 1493.5238|1.37%   |
|                       TestJust10Object| 3215.6339| 2854.6186|-11.23% |
|    TestJustAPrimitiveNullableWithValue| 2727.9085| 5182.2611|89.97%  |
|        TestJustObjectNullableWithValue| 3148.9484| 5672.2985|80.13%  |
|    TestJust5PrimitiveNullableWithValue| 5443.9232| 7795.6109|43.20%  |
|       TestJust5ObjectNullableWithValue| 6492.9071| 8095.1508|24.68%  |
|   TestJust10PrimitiveNullableWithValue| 6022.6274| 8723.572| 44.85%  |
|      TestJust10ObjectNullableWithValue| 7728.3239| 9671.1382|25.14%  |
|         TestJustAPrimitiveNullNullable| 1786.1337| 2230.0932|24.86%  |
|             TestJustObjectNullNullable| 1675.0683| 2326.0395|38.86%  |
|         TestJust5PrimitiveNullNullable| 2921.9497| 3298.4642|12.89%  |
|            TestJust5ObjectNullNullable| 3389.4043| 3615.3131|6.67%   |
|        TestJust10PrimitiveNullNullable| 3050.809|	 4054.9683|32.91%  |
|           TestJust10ObjectNullNullable| 4658.8316| 5335.0686|14.52%  |

Results are very positive, or within the margin of error in this test suite. These results were generated using a small benchmark which mostly targeted measuring the performance of the Unbox_Nullable helper, as it has the most complex and potentially slow code. Generally the impact on that helper is that the performance of the type system portion of the helper is faster, and the performance of code which actually copies the contents of a valuetype is a little better. This isn't quite a fair test of managed vs native performance though, as I took the opportunity to restructure some of the memory on `MethodTable` so that it could more easily be read in managed code, and that happened to make a fair bit of complex code become simpler and thus faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants