Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossgen2 support for static virtual method resolution (take 2) #87438

Merged
merged 21 commits into from
Aug 8, 2023

Conversation

trylek
Copy link
Member

@trylek trylek commented Jun 12, 2023

I have revived my change from two years back, rebased it against current main and updated it based mostly on Michal's PR feedback. It seems to work in my local testing so I'm publishing the PR and starting lab testing. I originally hoped to revive my old PR

#54063

but GitHub has apparently already locked it so I have to open a new one.

Thanks

Tomas

/cc @dotnet/crossgen-contrib, @dotnet/jit-contrib

@trylek trylek added this to the 8.0.0 milestone Jun 12, 2023
@trylek trylek requested review from EgorBo and davidwrighton June 12, 2023 23:28
@trylek trylek requested a review from MichalStrehovsky as a code owner June 12, 2023 23:28
@ghost ghost assigned trylek Jun 12, 2023
@trylek
Copy link
Member Author

trylek commented Jun 13, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member

EgorBo commented Jun 13, 2023

Does it make sense to also run crossgen outerloop jobs?

@trylek
Copy link
Member Author

trylek commented Jun 13, 2023

Thanks Egor for the suggestion, I'll run them once the primary runs finish fine to reduce lab cost in case some problem pops up. In general I have recently found out that Crossgen2 outerloop runs are currently busted due to several problematic IL tests so I don't expect a green run but a canary run will certainly be useful.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced any changes to MetadataVirtualMethodAlgorithm.cs are needed.

@trylek trylek force-pushed the Crossgen2SVMResolution branch 2 times, most recently from f870aa8 to 9ff9691 Compare June 15, 2023 21:44
@trylek
Copy link
Member Author

trylek commented Jun 15, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@trylek
Copy link
Member Author

trylek commented Jun 15, 2023

/azp run runtime-coreclr crossgen2 outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@trylek
Copy link
Member Author

trylek commented Jun 15, 2023

@MichalStrehovsky - based on your suggestion I have reverted all changes to MetadataVirtualMethodAlgorithm.cs and I found out that I just need to fix one more detail around module token resolution. @davidwrighton - please make sure to take a detailed look at the very first diff around line 280 in CorInfoImpl.ReadyToRun.cs where I had to modify the method ComputeActualOwningType you originally implemented.

Without this extra change Crossgen2 crashes in several SVM tests you originally authored including Loader\classloader\StaticVirtualMethods\GenericContext\GenericContextTestDefaultImp\GenericContextTestDefaultImp.ilproj where we're resolving NormalMethod and ComputeActualOwningType receives NonGenericClass as the methodTargetOwner parameter and IFaceGenericDefaultImp<string> (implemented by the class) as instantiatedOwningType.

else if (isStaticVirtual)
{
pResult->thisTransform = CORINFO_THIS_TRANSFORM.CORINFO_NO_THIS_TRANSFORM;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually possible to reach this code? My reading indicates that either the directMethod will be resolved to a non-null value above, or we will throw a RequiresRuntimeJitException.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reformulated this bit and I believe it's now reachable.

directMethod = constrainedType.ResolveVariantInterfaceMethodToStaticVirtualMethodOnType(originalMethod);
if (directMethod == null)
{
throw new RequiresRuntimeJitException(originalMethod.ToString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test to the version resiliency tests in src/tests/readytorun/tests so that if the static interface, and implementing type are not in the current R2R module, that if the implementation changes from a default implementation to an exact implementation that the generated R2R code referring to such a thing remains correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the latest commit, please take a look to double-check if it matches your intent.

@trylek trylek force-pushed the Crossgen2SVMResolution branch from 9ff9691 to a84ada5 Compare June 23, 2023 11:01
@@ -995,6 +995,13 @@ private static MethodDesc TryResolveVirtualStaticMethodOnThisType(MetadataType c
if (methodImpl.Decl == interfaceMethodDefinition)
{
MethodDesc resolvedMethodImpl = methodImpl.Body;
if (resolvedMethodImpl.OwningType != constrainedType.GetTypeDefinition())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the managed type system does a lot less validation than the CoreCLR type system in general - there are many other things that could be wrong with the type and prevent a load on CoreCLR that we'll not detect here. The way this is approached elsewhere is that we emit enough fixups to force this load on the CoreCLR side. I guess adding a check fixes some test, but it's not systematic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the problem would be:

Call<Foo>();

static void Call<T>() where T : IFoo => T.Something();

interface IFoo
{
    static virtual void Something() => Console.WriteLine("Hello");
}

struct Foo : IFoo
{
    extern void BreakThings();
}

Here CoreCLR will not allow Foo to load because it has an extern method with no body (just a random example of things CoreCLR looks at). If we generate code in crossgen2 that bypasses the need to load Foo (which I believe we'd do here, but I didn't actually check - I'm certain it can be tweaked to achieve that), we'd "fix" this even though it's not supposed to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can you please explain to me why the problem you described is specific to static virtual methods? I mean, let's say Crossgen2 allows JIT to inline a call to a static method - we're also not making sure on the Crossgen2 side that the callee sits in a loadable class that doesn't have any extern methods defined and whatnot. You may be right that it's problematic in certain corner cases and we should improve the type validation but I just don't see why this should be a SVM-specific problem; if a change in this logic turns out to be necessary, I tend to assume it should apply to all methods, not just static virtuals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's specific to static virtual methods. It is a general problem. I think the throw/check added here is just enough to fix a failing test, but it doesn't even fix this problem for bad MethodImpls, let alone for the many other cases when CoreCLR would refuse to load the type. E.g. if there's a broken MethodImpl for some other method but it's after we already found a match, we'll not notice and will not throw. CoreCLR will.

The preferred solution for this has always been to generate a fixup to force CoreCLR to load this type before we're allowed to run the code.

If a fixup is not possible, we're basically saying we'll need a CoreCLR-like type load emulator that will validate this whenever we touch a type. If that's the case, we should disable the test on ActiveIssue and budget for a CoreCLR-load emulator in .NET 9.

Adding partial checks in places that our tests happened to run into doesn't seem sustainable to me. E.g. we iterate MethodImpls in all of these places. Should each of these validate the MethodImpl is valid? And what about all the other scenarios where CoreCLR will refuse a load that we just happen not to have test coverage with static virtuals?

foreach (MethodImplRecord record in foundMethodImpls)
{
MethodDesc recordDecl = record.Decl;
if (interfaceDecl != recordDecl.OwningType.IsInterface)
continue;
if (!interfaceDecl)
recordDecl = FindSlotDefiningMethodForVirtualMethod(recordDecl);
if (recordDecl == decl)
{
return FindSlotDefiningMethodForVirtualMethod(record.Body);
}
}

foreach (MethodImplRecord methodImplRecord in currentType.VirtualMethodImplsForType)
{
MethodDesc declSlot = FindSlotDefiningMethodForVirtualMethod(methodImplRecord.Decl);
MethodDesc implSlot = FindSlotDefiningMethodForVirtualMethod(methodImplRecord.Body);
if (unificationGroup.IsInGroup(declSlot) && !unificationGroup.IsInGroupOrIsDefiningSlot(implSlot))
{
unificationGroup.RemoveFromGroup(declSlot);
separatedMethods ??= new MethodDescHashtable();
separatedMethods.AddOrGetExisting(declSlot);
if (unificationGroup.RequiresSlotUnification(declSlot) || implSlot.RequiresSlotUnification())
{
if (implSlot.Signature.EqualsWithCovariantReturnType(unificationGroup.DefiningMethod.Signature))
{
unificationGroup.AddMethodRequiringSlotUnification(declSlot);
unificationGroup.AddMethodRequiringSlotUnification(implSlot);
unificationGroup.SetDefiningMethod(implSlot);
}
}
continue;
}

MethodImplRecord[] possibleImpls = runtimeInterface.FindMethodsImplWithMatchingDeclName(interfaceMethod.Name);
if (possibleImpls != null)
{
foreach (MethodImplRecord implRecord in possibleImpls)
{
if (implRecord.Decl == interfaceMethodDefinition)
{
// This interface provides a default implementation.
// Is it also most specific?
if (mostSpecificInterface == null || Array.IndexOf(runtimeInterface.RuntimeInterfaces, mostSpecificInterface) != -1)
{
mostSpecificInterface = runtimeInterface;
impl = implRecord.Body;
diamondCase = false;
}
else if (Array.IndexOf(mostSpecificInterface.RuntimeInterfaces, runtimeInterface) == -1)
{
diamondCase = true;
}
break;
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've written the bare bones of a CoreCLR based type load emulator as part of supporting the skip type validation feature of R2R, but its incomplete, and does not cover all of the failure paths (It handles most of the checks that skip type validation skips). I see that we ALSO have a different validator that is used in NativeAOT/Crossgen2 in other cases in the jit interface. We really ought to unify the two and finish them up in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we ALSO have a different validator that is used in NativeAOT/Crossgen2 in other cases in the jit interface

That validator was originally written for NativeAOT purposes - we have a more relaxed approach to invalid inputs there and the validator basically just does enough validation to prevent a compiler crash later (type system exceptions are catchable/handlable only when hit while generating method code and fatal in most other places).

Crossgen2 needs to be more strict because "fixing" a problem initially, but resurfacing it again when the method is rejitted into a higher tier is worse than the NativeAOT behavior (that just doesn't throw the exception it was supposed to throw and leaves things at that).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both R2R and tiered JIT assume that the IL is valid. Invalid IL can have behavior differences between R2R and JIT or between Tier0 JIT and Tier1 JIT. The AOT compiler validator does not need to be perfect. Its main purpose is to catch missing dependencies that are very common in the code out there. We have tacked some invalid metadata handling to it too, but that is mostly just to make some of our own tests for invalid patterns happy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we view it as "just a thing to make a test pass", I'm fine with adding a check here.

However, adding a new ExceptionStringId requires more work than just a new enum member. NativeAOT compiler will potentially catch this exception at compile time and rethrow it at runtime, generating an exception message using CoreLib's localization.

Also the exception type and exception message should match CoreCLR's exception (that's the pattern we've been following).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I generally agree it's a good idea to emit a fixup to make sure the target type of a SVM gets loaded, I'm worried that that this particular case is somewhat more tricky - if Crossgen2 silently ignores "explicit override forwarders" in the sense of allowing the SVM implementation to be redirected to a completely different type, its load check may happily pass as it would be completely oblivious of the original "intermediate type" that spawned the forward; otherwise we would probably need to encode fixups for all three involved types - the original interface type where the SVM is defined, the targeting constrained type and the actual resolution type - and that seems somewhat wasteful to me and potentially cancelling out the SVM prejit perf savings due to more work that needs to happen at runtime.

@trylek trylek force-pushed the Crossgen2SVMResolution branch from f1fb1fe to 1374c32 Compare August 8, 2023 15:28
@trylek
Copy link
Member Author

trylek commented Aug 8, 2023

@mangod9 - As discussed offline I traditionally hit a snag with Crank that now seems to have trouble using locally built apps. I'll follow up with Sebastién, for now I have dusted off our good old friend Jellyfin and I think I am able to see some runtime JIT reduction with this change - for default R2R publishing 453 methods get jitted at runtime before the app crashes; in composite publishing it gets reduced to 341 runtime-jitted methods and further to 252 methods when using composite built with this Crossgen2 change. While that's just an anecdotal observation, I think it shows the potential for some startup perf improvement; I have rebased this change against the latest main and I plan to retest it and merge it in today around noon unless anyone objects.

@mangod9
Copy link
Member

mangod9 commented Aug 8, 2023

for default R2R publishing 453 methods get jitted at runtime before the app crashes;

just confirming that the app crashes are what is being fixed here or is that unrelated? Nice that you are observing less JITing which should be good to merge ( assume you are observing more methods emitted in the R2R image as well?). Thx!

@trylek
Copy link
Member Author

trylek commented Aug 8, 2023

@mangod9 - of course not, it crashes in all build modes, I just didn't bother to fix all of it, we never had a fully functional version supporting .NET 8. In the meantime I have also tried "dotnet new webapi" in composite mode without and with the MIBC data. Without MIBC data, the "normal" version jits 200 methods at runtime (including shutdown), with this change it jits 179 methods. With the StandardOptimizationData.mibc, the normal version jits 75 methods at runtime and the version with this PR jits 53 methods. As you can see, the diffs are similar (about 20 methods) so that's presumably thanks to Crossgen2 being newly able to compile methods calling SVMs.

@trylek
Copy link
Member Author

trylek commented Aug 8, 2023

For the methods emitted into the R2R file, I don't have an exact diff as the R2RDump diff mode has been broken for a while but I see that for "dotnet new webapi" the full composite PE executable with this change is slightly larger (100.4 MB vs 98.9 MB or 1.4 MB delta) and contains more generics (the instance method entrypoint section size is 4.8 MB vs. 4.74 MB or by about 60 KB larger); interestingly enough the number of non-generic methods is the same which is probably expected as I believe that currently the biggest user of SVMs in the framework is generic maths.

@trylek trylek merged commit cb08364 into dotnet:main Aug 8, 2023
@trylek trylek deleted the Crossgen2SVMResolution branch August 8, 2023 18:43
@xtqqczze
Copy link
Contributor

xtqqczze commented Aug 9, 2023

For the methods emitted into the R2R file, I don't have an exact diff as the R2RDump diff mode has been broken for a while

Seeing a increase in code size for some generic maths, see #84421 (comment).

@EgorBo
Copy link
Member

EgorBo commented Aug 14, 2023

It improved startup time in TE benchmarks, e.g.:

image

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants