-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move unboxing helpers to managed code #109135
Conversation
Tagging subscribers to this area: @mangod9 |
} | ||
|
||
// Set the hasValue field on the Nullable type. It MUST always be placed at the start of the object. | ||
*(bool*)destPtr = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct. The cast is casting the value of the reference into pointer. Unsafe.As
would be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems wrong. I kept waffling between refs and pointers here. Thanks.
{ | ||
[StackTraceHidden] | ||
[DebuggerStepThrough] | ||
internal static unsafe partial class BoxingHelpers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these types are going to be loaded on startup path. It feels a bit too fine grained to have separate type for the few unboxing helpers.
Would it be better for the unboxing helpers to live in CastHelpers? They are coupled anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, seems reasonable to me. I had thought I might be moving the allocation based helpers over too so this class wouldn't be so small, but honestly, just copying the assembly routines from NativeAOT seems like a better path for most of those.
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
// Set the hasValue field on the Nullable type. It MUST always be placed at the start of the object. | ||
Unsafe.As<byte, bool>(ref destPtr) = true; | ||
ref byte destValuePtr = ref typeMT->GetNullableValueFieldReferenceAndSize(ref destPtr, out uint size); | ||
Unsafe.CopyBlockUnaligned(ref destValuePtr, ref RuntimeHelpers.GetRawData(obj), size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who does InitValueClass
above need to be concerned with ref vs. non-ref differences, but this place does not need to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I haven't yet gotten around to fixing this problem... and am in the process of verifying if the alignment handling is necessary or not.
MethodTable* pMT2 = RuntimeHelpers.GetMethodTable(obj); | ||
if ((pMT1->IsPrimitive && pMT2->IsPrimitive && | ||
pMT1->GetPrimitiveCorElementType() == pMT2->GetPrimitiveCorElementType()) || | ||
AreTypesEquivalent(pMT1, pMT2)) | ||
{ | ||
return ref RuntimeHelpers.GetRawData(obj); | ||
} | ||
|
||
CastHelpers.ThrowInvalidCastException(obj, pMT1); | ||
return ref Unsafe.AsRef<byte>(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert the if here to remove the unreachable return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do this, I expect that ThrowInvalidCastException
would have to throw
in C# to make the JIT realize that it is cold code and produce the desired code layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, only now I've noticed that it does a QCall instead of direct throw. What's the point of the runtime call here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And there are resource file changes that I would need to do to get the right string. I'll do it if you want @jkotas, but honestly this isn't so bad as it is. I couldn't find any impact on performance from the generated assembly code.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
uint numInstanceFieldBytes = pMT->GetNumInstanceFieldBytes(); | ||
if ((((uint)Unsafe.AsPointer(ref destBytes) | numInstanceFieldBytes) & ((uint)sizeof(void*) - 1)) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is more or less efficient to check pMT->ContainsGCPointers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I'm not quite ready to benchmark this, and need to validate that we actually need to do aligned work here at all. I see 3 basic implementations.
- This bifurcated approach, with some sort of if check on safety of using the general purpose clearing routine. That sort of thing is safe for cases where the destination may be on the heap or not. Using
ContainsGCPointers
wouldn't have been safe for the C++ implementation of all of this, but since this implementation of InitValueClass is actually only used forNullable<T>
instances, we CAN use that flag. I need to some performance investigation to determine the actual fastest approach here. - An approach where we p/invoke to 'C' memset as an FCall. That is only safe if the destination is guaranteed to be on the current stack. This needs to be an FCall since otherwise the specification of memset permits non-pointer atomic stores, and an inopportune GC suspension could see an invalid pointer.
- A custom set routine which does a warmup series of sets to individual bytes until the region is aligned, then sets pointer sized chunks using atomic stores. (The CPU notion of atomic, not the C++ notion), and then has a warmdown phase. This is safe for all cases as well, and is what the old C++ implementation used to do.
A similar set of choices exists for the actual copying routine below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, my analysis of the usage of Unbox_Nullable is that usage for the JIT scenario always works with a stack based destination, but there is usage from Array.CoreCLR.cs which can unbox to the heap. I'm building a small suite of perf tests to see what the performance of various options in this space looks like.
…Services/CastHelpers.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>
…nbox type testing
…ntime into UnboxingHelpers
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/CastHelpers.cs
Outdated
Show resolved
Hide resolved
MethodTable* pMT2 = RuntimeHelpers.GetMethodTable(obj); | ||
if ((pMT1->IsPrimitive && pMT2->IsPrimitive && | ||
pMT1->GetPrimitiveCorElementType() == pMT2->GetPrimitiveCorElementType()) || | ||
AreTypesEquivalent(pMT1, pMT2)) | ||
{ | ||
return ref RuntimeHelpers.GetRawData(obj); | ||
} | ||
|
||
CastHelpers.ThrowInvalidCastException(obj, pMT1); | ||
return ref Unsafe.AsRef<byte>(null); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
- Fix assert that was always firing - Tweak code per code review to make the code generator aware that ThrowInvalidCastException is going to throw and should always be in a cold path - Improve the implementation of MethodTable.IsPrimitive. I noticed that this could be a single and+compare instead of the pair it was before. This improves the performance of Unbox_Helper slightly by about 5%.
|
||
[DebuggerHidden] | ||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
internal static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2) | |
private static void Unbox_TypeTest_Helper(MethodTable *pMT1, MethodTable *pMT2) |
|
||
[DebuggerHidden] | ||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
internal static ref byte Unbox_Helper(MethodTable* pMT1, object obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal static ref byte Unbox_Helper(MethodTable* pMT1, object obj) | |
private static ref byte Unbox_Helper(MethodTable* pMT1, object obj) |
else | ||
{ | ||
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass | ||
nuint numInstanceFieldBytes = typeMT->BaseSize - (nuint)(2 * sizeof(IntPtr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numInstanceFieldBytes
local var is unusued. Is this unfinished refactoring?
It may be nice to move this to a property or method on MethodTable. There are more places where this micro-optimization can be used. The property impl can assert that this is only used on when ContainsGCPointers
is true and that it returns the same value as full GetNumInstanceFieldBytes
.
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private static void Unbox_Nullable_NotIsNullableForType(ref byte destPtr, MethodTable* typeMT, object obj) | ||
{ | ||
// For safety's sake, also allow true nullables to be unboxed normally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know that this is pre-existing comment.) This is not for safety's sake. We would not give up any safety if we threw for boxed Nullable<T>
that should not exist here. It is just to hide bugs. Have you tried deleting this to see whether anything fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some cases in func eval which claim to use this pathway. In addition, it IS a pathway that has been used in reflection in the past, so I don't want to remove this path.
} | ||
else | ||
{ | ||
UnboxNullableValue(ref destPtr, typeMT, obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason why we are not just calling BulkMoveWithWriteBarrier/Memmove here to copy the value? I would expect it to be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that there is a fair amount of type system structure that would need to be ported to managed code to call BulkMoveWithWriteBarrier. I made the decision to pull this out into its own FCALL here. Especially since I expect that most paths will need to do a BulkMoveWithWriteBarrier which isn't that different in perf. I can explore pulling enough type system structure for this if you'd like me to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, now that I look at this, I realized that I can actually encode the needed bits for this particular load into the space used by m_pInterfaceMap
which would allow all of this to avoid a couple of memory loads, branches etc. All of that should allow me to optimize the rest of all of this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure that I understand - is the main missing piece Nullable::ValueAddr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its ValueAddr + the computation of the size of the instance field bytes of the Value field. That requires access to the EEClass + access to a fielddesc, etc. It boils down to not that many instructions, but it IS a fair number of concepts and type system structures. That is a task probably worth doing at some point, but I'd like to avoid doing it now. The new encoding of data, which is redundant with existing data, allows for extremely cheap access, and is simple to encode into the managed code without adding more concepts than MethodTable
.
} | ||
else | ||
{ | ||
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass |
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public uint GetNumInstanceFieldBytesIfContainsGCPointers() | ||
{ | ||
Debug.Assert(ContainsGCPointers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug.Assert(ContainsGCPointers); | |
// If the type ContainsGCPointers, we can compute the size without resorting to loading the BaseSizePadding field from the EEClass | |
Debug.Assert(ContainsGCPointers); |
@@ -706,12 +703,41 @@ internal unsafe struct MethodTable | |||
[FieldOffset(ElementTypeOffset)] | |||
public void* ElementType; | |||
|
|||
/// <summary> | |||
/// The PerInstInfo is used to describe the generic arguments and dictionary of this type. | |||
/// It points as a PerInstInfo, which is an array of pointers to generic dictionaries, which then point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It points as a PerInstInfo
This does not parse for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the wording. Hopefully it is a bit more clear although for sure, the actual design of this is a very confusing structure.
if (pMT->IsValueType()) | ||
{ | ||
DWORD baseSizePadding = pMT->GetClass()->GetBaseSizePadding(); | ||
_ASSERTE(baseSizePadding == (sizeof(TADDR) * 2)); // This is dependended on by the System.Runtime.CompilerServices.CastHelpers.IsNullableForType code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see the dependency in IsNullableForType
. Obsolete comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the comment to reference GetNumInstanceFieldBytesIfContainsGCPointers
Move the unboxing helpers to managed code. Behavior is basically identical except for the Unbox_Nullable paths, which required some investigation to find the fastest implementation. Notably, there interruptibility of managed code makes the copying/zeroing of values more difficult, but with the opportunity/requirement to specialize the codebase came a few micro-optimizations that are somewhat nice. Overall I don't expect anyone to notice the performance changes here, but since my earlier code was about 2X slower than the native implementation, I did feel the need to optimize until everything looked good. Performance results: | TestName | With PR | Without PR | % Speedup | | --- | --- | --- | --- | | TestJustAPrimitive| 1217.0047| 1221.203| 0.34% | | TestJustObject| 1212.6415| 1211.6437|-0.08% | | TestJust5Primitive| 1496.5304| 1522.4968|1.74% | | TestJust5Object| 1461.0507| 1488.3328|1.87% | | TestJust10Primitive| 1473.2814| 1493.5238|1.37% | | TestJust10Object| 3215.6339| 2854.6186|-11.23% | | TestJustAPrimitiveNullableWithValue| 2727.9085| 5182.2611|89.97% | | TestJustObjectNullableWithValue| 3148.9484| 5672.2985|80.13% | | TestJust5PrimitiveNullableWithValue| 5443.9232| 7795.6109|43.20% | | TestJust5ObjectNullableWithValue| 6492.9071| 8095.1508|24.68% | | TestJust10PrimitiveNullableWithValue| 6022.6274| 8723.572| 44.85% | | TestJust10ObjectNullableWithValue| 7728.3239| 9671.1382|25.14% | | TestJustAPrimitiveNullNullable| 1786.1337| 2230.0932|24.86% | | TestJustObjectNullNullable| 1675.0683| 2326.0395|38.86% | | TestJust5PrimitiveNullNullable| 2921.9497| 3298.4642|12.89% | | TestJust5ObjectNullNullable| 3389.4043| 3615.3131|6.67% | | TestJust10PrimitiveNullNullable| 3050.809| 4054.9683|32.91% | | TestJust10ObjectNullNullable| 4658.8316| 5335.0686|14.52% | Results are very positive, or within the margin of error in this test suite. These results were generated using a small benchmark which mostly targeted measuring the performance of the Unbox_Nullable helper, as it has the most complex and potentially slow code. Generally the impact on that helper is that the performance of the type system portion of the helper is faster, and the performance of code which actually copies the contents of a valuetype is a little better. This isn't quite a fair test of managed vs native performance though, as I took the opportunity to restructure some of the memory on `MethodTable` so that it could more easily be read in managed code, and that happened to make a fair bit of complex code become simpler and thus faster.
Move the unboxing helpers to managed code.
Behavior is basically identical except for the Unbox_Nullable paths, which required some investigation to find the fastest implementation. Notably, there interruptibility of managed code makes the copying/zeroing of values more difficult, but with the opportunity/requirement to specialize the codebase came a few micro-optimizations that are somewhat nice. Overall I don't expect anyone to notice the performance changes here, but since my earlier code was about 2X slower than the native implementation, I did feel the need to optimize until everything looked good.
Performance results:
Results are very positive, or within the margin of error in this test suite. These results were generated using a small benchmark which mostly targeted measuring the performance of the Unbox_Nullable helper, as it has the most complex and potentially slow code. Generally the impact on that helper is that the performance of the type system portion of the helper is faster, and the performance of code which actually copies the contents of a valuetype is a little better. This isn't quite a fair test of managed vs native performance though, as I took the opportunity to restructure some of the memory on
MethodTable
so that it could more easily be read in managed code, and that happened to make a fair bit of complex code become simpler and thus faster.Benchmark code (standalone console app)