Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression test 46239 with Crossgen2 and improve runtime logging #51416

Closed
wants to merge 16 commits into from

Conversation

trylek
Copy link
Member

@trylek trylek commented Apr 16, 2021

The regression test

src\tests\JIT\Regressions\JitBlue\Runtime_46239

exercises various interesting corner cases of type layout that
weren't handled properly in Crossgen2 on x86 and ARM[32]. This
change fixes the remaining deficiencies and it also adds
provisions for better runtime logging upon type layout mismatches.

Thanks

Tomas

/cc @dotnet/crossgen-contrib

P.S. For now I don't expect we need to push this to Preview 4 bits
as the change covers a rather corner case on the less prevalent
architectures; please let me know if you believe otherwise.

@@ -304,17 +305,19 @@ protected virtual void FinalizeRuntimeSpecificStaticFieldLayout(TypeSystemContex
{
}

protected static ComputedInstanceFieldLayout ComputeExplicitFieldLayout(MetadataType type, int numInstanceFields)
protected virtual bool IsBlittableOrManagedSequential(TypeDesc type) => false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the default here should be true. Blittable types have a requirement that they are handled in the same way on all runtimes as they describe native data structures, and by defaulting to false, this leads for a way for the algorithm to misbehave on Native AOT.

I'd prefer to see this either be default true, or make it an abstract method and force the NativeAOT compiler to deal with this when it gets there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalStrehovsky could you comment here on what you'd like to see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's just move IsManagedSequentialType to here and then this doesn't even need to be virtual and can just do the same thing?

We didn't have trouble with this in .NET Native (where this is missing) so I don't think it matters, but we ported over so many CoreCLR weirdness's to this algorithm over time that I no longer understand how layout is done and would prefer all bugs in it to be also surfaced in crossgen2 (and not have NativeAOT specific bugs that we need to troubleshoot there).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalStrehovsky - Thanks for your feedback. I'm trying to modify the change based on your suggestion but I'm hitting a layering problem - IsBlittable is part of MarshalUtils and that's not in ILCompiler.TypeSystem.ReadyToRun and it seems to me that moving it over brings various bits of interop code into ILCompiler.TypeSystem.ReadyToRun which I suspect to be undesirable as my current understanding is that NativeAOT doesn't use exactly identical interop as Crossgen2 / CoreCLR. What do you think would be the ideal way to resolve this discrepancy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarshalUtils are part of ILCompiler.TypeSystem on the NativeAOT side, so I'm not opposed to moving it (and things it depends on) into ILCompiler.TypeSystem.ReadyToRun as well. The split is rather arbitrary and interop is getting tangled into the type system the same way as is on CoreCLR. It was a nice dream to have a separation of concerns there.

https://github.com/dotnet/runtimelab/blob/c8e3158b0981419527b18f3c0259bcf02103e054/src/coreclr/tools/aot/ILCompiler.TypeSystem/ILCompiler.TypeSystem.csproj#L528-L530

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split is rather arbitrary and interop is getting tangled into the type system the same way as is on CoreCLR. It was a nice dream to have a separation of concerns there.

Would there be a way to adjust the runtime side of the things to avoid this mess?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for now I think I have at least managed to make something like an inventory of the various runtime behaviors by implementing them in Crossgen2. In theory we may be able to simplify some of the stuff, I however suspect that some of the subtle distinctions amount to tiny memory use optimizations on the runtime side i.e. something that's very tricky to undo as it may incur working set regressions in arbitrary scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however suspect that some of the subtle distinctions amount to tiny memory use optimizations on the runtime side

I doubt that it would show up on the radar. The behavior of the field layout algorithm is completely accidental in number of cases, and in fact there are many known situations where it is inefficient for no good reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably the biggest expert in this field so I have no desire to doubt your assessment. My only point is that further independent development of runtime vs. Crossgen2 side of this equation is giving me goosebumps; my current expectation is that we'll ultimately go down the path you yourself suggested in the past - having a way to communicate the field layouts from the compiler to the runtime, that will give us room for experimentation with potential packing optimizations and rid us of the burdensome need to keep both codebases in 100% sync.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my comment on the IsBlittableOrManagedSequential, I'm pretty happy with this.

@BruceForstall
Copy link
Member

@BruceForstall
Copy link
Member

Any chance you'll get this in soon? there are a lot of test jobs failing due to test49826.

davidwrighton
davidwrighton previously approved these changes Apr 23, 2021
@@ -2257,7 +2249,7 @@ private bool NeedsTypeLayoutCheck(TypeDesc type)

private bool HasLayoutMetadata(TypeDesc type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one place where this method is used looks like dead/unrechable code.

This method can only return true when type.IsValueType is true, but this case is handled in EncodeFieldBaseOffset earlier via the else if (pMT.IsValueType) check. It means that this method won't ever return true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup, thanks Jan for pointing that out, deleted in 9th commit.

{
PreventRecursiveFieldInlinesOutsideVersionBubble(field, callerMethod);

// We won't try to be smart for classes with layout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the old crossgen has a special case for this with this comment that kicks in for non-value types (different from what was here before that kicked in for value types).

Do we want to match it? Or are we confident that we have extensive test coverage for all non-value types with layout corner cases?

Also, once we add this back, can at least some of the remaining changes be simplified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding of the CoreCLR code, the layout check is based on the following condition:

BOOL fHasLayout =

It seems to me that this means that the conditional statement you originally pointed out should basically stay, the problematic bit is the method HasLayoutMetadata itself - we're certainly trying to match the CoreCLR runtime behavior, in this particular case we probably just missed the fact that the CoreRT version of HasLayoutMetadata is quite different than its CoreCLR runtime counterpart method

BOOL HasLayoutMetadata(Assembly* pAssembly, IMDInternalImport* pInternalImport, mdTypeDef cl, MethodTable* pParentMT, BYTE* pPackingSize, BYTE* pNLTType, BOOL* pfExplicitOffsets)

I'll look into putting HasLayoutMetadata back and modifying it based on the CoreCLR runtime version. Other than that, can you please be more specific regarding the simplifications you're suggesting? While I do agree that some of the runtime logic is complicated and somewhat arbitrary, I'm worried that unsettling it at this point might cause more harm than good.

Copy link
Member

@jkotas jkotas Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the equivalent of the HasLayoutMetadata check should be just type.IsSequentialLayout || type.IsExplicitLayout.

Yes, the runtime logic is complicated. We have done multiple rounds of trying to replicate it, and we are still finding mismatches. It is safe bet that this is not the last round of fixes. I think we need to make it simpler to get confident that it matches. I will do some ad-hoc testing to try to find more situations where it does not match.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan for your additional feedback. I have updated the runtime check based on your suggestion. Once the change passes basic testing, I'll also trigger another round of Pri1 testing that previously uncovered most of these inconsistencies. It would be certainly awesome if you were willing to use your vast expertise in this area to devise additional tests exercising various obscure corner cases we may have previously missed.

@@ -857,7 +915,9 @@ private static SizeAndAlignment ComputeInstanceSize(MetadataType type, LayoutInt
{
if (type.IsValueType)
{
instanceSize = LayoutInt.AlignUp(instanceSize, alignment, target);
instanceSize = LayoutInt.AlignUp(instanceSize,
alignUpInstanceByteSize ? alignment : LayoutInt.Min(alignment, target.LayoutPointerSize),
Copy link
Member

@jkotas jkotas Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the alignUpInstanceByteSize = false case will kick in for a struct like this on ARM32:

    [StructLayout(LayoutKind.Explicit)]
    internal struct S3
    {
        [FieldOffset(0)]
        public ulong tmp1;
        [FieldOffset(8)]
        public Object tmp2;
    }

The algorithm will compute the instanceSize of this struct as 12. Is that correct?

It looks like a bug in the type loader. The instance size of this struct on ARM should be 16, so that the long field is aligned at 8 bytes, so that potential atomic 64-bit operations work fine on it. It would be better to fix the type loader instead of replicating the bug here.

}

return false;
return MetadataFieldLayoutAlgorithm.IsBlittableOrManagedSequentialType(type) || (type is MetadataType mt && mt.IsExplicitLayout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this to mt.IsExplicitLayout || mt.IsSequentialLayout. Maybe even inline it at the callsite.

@@ -803,7 +803,7 @@ protected override ComputedInstanceFieldLayout ComputeInstanceFieldLayout(Metada
return ComputeExplicitFieldLayout(type, numInstanceFields);
}
else
if (type.IsEnum || MarshalUtils.IsBlittableType(type) || IsManagedSequentialType(type))
if (type.IsEnum || IsBlittableOrManagedSequentialType(type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get rid of dependency of the type layout algorithm on blittability here, I think we would simplify this into:

if (type.IsSequential && type.ContainsPointers)

(with matching change in the runtime). It would be R2R breaking change, but it would set us up for future.

The dependency of the layout algorithm on blittability makes changes like #103 R2R breaking changes. We did not treat #103 as a R2R breaking change since the connection between interop, type layout and R2R was not understood.

@@ -304,17 +304,17 @@ protected virtual void FinalizeRuntimeSpecificStaticFieldLayout(TypeSystemContex
{
}

protected static ComputedInstanceFieldLayout ComputeExplicitFieldLayout(MetadataType type, int numInstanceFields)
protected ComputedInstanceFieldLayout ComputeExplicitFieldLayout(MetadataType type, int numInstanceFields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you changed this from static to non-static?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point it was needed because there used to be a helper method calculating the offset bias. I have double checked it's not longer necessary after all the refactorings, will remove, thanks for pointing that out!

}

return false;
return type is MetadataType metadataType && (metadataType.IsSequentialLayout || metadataType.IsExplicitLayout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also delete/replace IsBlittableType from ComputeInstanceFieldLayout so that the layout does not depend on the algorithm to compute blittability?

trylek added 15 commits May 24, 2021 21:05
The regression test

<code>src\tests\JIT\Regressions\JitBlue\Runtime_46239</code>

exercises various interesting corner cases of type layout that
weren't handled properly in Crossgen2 on x86 and ARM[32]. This
change fixes the remaining deficiencies and it also adds
provisions for better runtime logging upon type layout mismatches.

Thanks

Tomas
With this change, the only remaining pipelines using Crossgen1 are
"r2r.yml", "r2r-extra.yml" and "release-tests.yml". I haven't yet
identified the pipeline running the "release-tests.yml" script;
for the "r2r*.yml", these now remain the only pipelines exercising
Crossgen1. I don't think it makes sense to switch them over to
CG2 as we already have their CG2 counterparts; my expectation is
that, once CG1 is finally decommissioned, they will be just deleted.

Thanks

Tomas
I have basically reverted the marshalling code shuffles as they
are no longer necessary based on the direction suggested by JanK.
I haven't made the counterpart runtime changes yet, I'm running
an initial smoke test to see all the places that blow up.

Thanks

Tomas
Based on Jan's suggestion I have changed the runtime behavior to
align explicit layout structs by default. The funny part here was
that the implementation is two-tiered, split into the case with
vs. without layout metadata, using two completely different code
paths.

I have extracted a small part of the EEClassLayoutInfo
calculation dealing with primitive field size and alignment into
code usable by both codepaths to reduce code duplication. I have
also bumped up the R2R compatibility version information as this
is (intentionally) a globally breaking change for R2R images
produced by older versions.

As part of simplifying the various conditions I have deleted a
section that claims we shouldn't be updating the number of instance
bytes for blittable / managed sequential types. I just hope this
is no longer needed so that I don't have to redo the marshalling
helper shuffles again.

Thanks

Tomas
Per JanK's feedback I simplified CoreCLR runtime calculation of
explicit layout size but I missed that Crossgen2 actually differs
from it in a very subtle manner. This change fixes it by only
accepting the explicit size if it's larger than or equal to the
unaligned calculated explicit layout size.

Thanks

Tomas
I have simplified runtime and Crossgen2 behavior in the presence
of sequential layout by removing blittability from the picture
as the IsBlittable check is very complex and hard to maintain in
sync between the native runtime and the managed compiler. I have
introduced a somewhat weaker and much simpler check
MayContainGCPointers that returns FALSE for all IsBlittable types
(and for a few others).

Thanks

Tomas
In my investigation of the x86 failures I hit an interesting corner
case of the class GCMemoryInfoData that is checked for having managed
layout matching its native counterpart. Interestingly enough, that
only worked because the class internally reported as non-blittable
and not managed sequential so it ended up using the auto layout;
when my change switched it over to use the sequential layout, it
turned out that the sequential layout algorithm differs from its
native counterpart. While the change is technically breaking this
corner scenario, I suspect that the primary purpose of sequential
layout is interop with native code and I find the fact that it's
actuall mismatched concerning beyond the scope of my pending change.

Thanks

Tomas
I originally thought the modification may assist debugging layout
issues as I did during the investigation. It however turns out
that basically the same class name check can be put in
EEClassLayoutInfo::CollectLayoutFieldMetadataThrowing so I am
removing this as superfluous temporary instrumentation.

Thanks

Tomas
@trylek
Copy link
Member Author

trylek commented May 27, 2021

@jkotas - I believe I have managed to make some progress on the change based on your suggestions but in accordance with @davidwrighton I think it's even riskier to take than the original attempt using our existing blittability / managed sequential checks. As you can easily imagine, simplifying the condition for using sequential vs. auto layout silently switches over some types from one to the other and the bug tail is enormous, I've been digging through it over the last two weeks and I'm nowhere near green state. Two particular simple examples:

  1. GCMemoryInfoData - this managed class has a native runtime counterpart and we use the DEFINE_CLASS / DEFINE_FIELD macros to check their consistency. Without the change the class uses auto layout as it's disqualified from managed sequential due to being a reference type; interestingly enough, once my change switched it over to start using the sequential layout, it fell out of sync on x86 because the sequential layout algorithm is a worse match for the native layout than the auto layout. I have fix for this particular issue but the reality is that the fix technically changes the semantics of sequential layout and potentially affects blittable types and interop in general.

  2. LAHashDependentHashTracker - this managed class has a native runtime counterpart just like GCMemoryInfoData but without my change it's considered blittable i.e. eligible for using sequential layout. When I attempted to limit changes to pre-existing behavior by only using sequential layout for value types without GC pointers, it switched this class to auto layout which ended up reordering its two fields and falling out of sync with the runtime.

My current plan is to mark this change as NO-MERGE for now and revive its previous version after incorporating Michal's feedback in form of a new PR; I can return to this cleanup after we fork off the final .NET 6 shipping bits - the 1-2 month period of inability to make more substantial runtime changes should be ideal for repeated testing and polishing aspects of this change. In many cases it will be useful / needed to consult official native compiler documentation to make sure we're doing the right / expected thing.

@trylek
Copy link
Member Author

trylek commented Jun 3, 2021

Closing for now; I'll revive the PR after we fork off for .NET 6 and I'll start the new CI testing rounds to iron out the various problems.

@trylek trylek closed this Jun 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants