Fix analysis of interface methods on generic types #93540

MichalStrehovsky · 2023-10-16T09:03:11Z

Fixes an issue observed in #92881. The dependency analysis within the compiler was incorrectly considering Equals(object x, object y) to be the implementation of IEqualityComparer<T>.Equals(T, T). When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis.

Looks like this has been broken ever since interface support was added to CoreRT in 2016 and the bug survived additions of default and static interface methods: dotnet/corert#626.

Cc @dotnet/ilc-contrib

Fixes an issue observed in dotnet#92881. The dependency analysis within the compiler was incorrectly considering `Equals(object x, object y)` to be the implementation of `IEqualityComparer<T>.Equals(T, T)`. When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis. Looks like this has been broken ever since interface support was added to CoreRT: dotnet/corert#626.

ghost · 2023-10-16T09:03:17Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes an issue observed in #92881. The dependency analysis within the compiler was incorrectly considering Equals(object x, object y) to be the implementation of IEqualityComparer<T>.Equals(T, T). When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis.

Looks like this has been broken ever since interface support was added to CoreRT in 2016 and the bug survived additions of default and static interface methods: dotnet/corert#626.

Cc @dotnet/ilc-contrib

Author:	MichalStrehovsky
Assignees:	MichalStrehovsky
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

MichalStrehovsky · 2023-10-16T12:38:04Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

+ TypeDesc implType = defType;
+ while (!implType.HasSameTypeDefinition(implMethod.OwningType))
+ implType = implType.BaseType;


The test I'm disabling is hitting a nullref in this block because we walk past system.object without finding a suitable base (the test is testing invalid IL). We could do something when we hit null here, but I'm not sympathetic to that idea. We already disable the test in R2R testing for similar reasons.

The "proper" fix would be to walk the potential vtable as part of EEType validation in the EETypeNode constructor (so that we throw there and can invalidate whatever method body is causing this) but this is expensive.

but this is expensive.

By that I mean compiler throughput, not amount of work.

MichalStrehovsky · 2023-10-16T13:04:53Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2023-10-16T13:05:09Z

Azure Pipelines successfully started running 1 pipeline(s).

agocke · 2023-10-16T22:37:05Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

 for (int interfaceIndex = 0; interfaceIndex < defTypeRuntimeInterfaces.Length; interfaceIndex++)
 {
 DefType interfaceType = defTypeRuntimeInterfaces[interfaceIndex];
+ DefType interfaceDefinitionType = defTypeDefinitionRuntimeInterfaces[interfaceIndex];


So, just to make sure I have this right

For class C<T> : I1<T>, I2<string> and an instance C<string>

defTypeDefinition is the unsubstituted type of the current type, so C<T> for C<string>
defTypeRuntimeInterfaces is the implemented interfaces of the current type, so I1<string>, I2<string>
defTypeDefinitionRuntimeInterfaces is the interfaces of the unsubstituted current type, so I1<T>, I2<string>

interfaceType is the current implemented interfaces, so I1<string> or I2<string>
interfaceTypeDefinitionType is the unsubstituted... equivalent? So I1<T>, I2<string>. Notably I2<string>, right? Not the unsubstituted one?

Yes for all of the questions.

agocke · 2023-10-16T23:33:46Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
 if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
 continue;

+ MethodDesc interfaceMethodDefinition = interfaceMethod;


When we say GVMs aren't tracked here, does GVM mean that the method itself doesn't have generic parameters, or that the containing type also doesn't have generic parameters (or both)?

In the ILCompiler codebase, we exclusively refer to GVMs as methods that have generic parameters themselves.

Methods on generic types (that don't have their own generic method parameters) are not called GVMs here.

agocke · 2023-10-16T23:39:24Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
 if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
 continue;

+ MethodDesc interfaceMethodDefinition = interfaceMethod;
+ if (interfaceType != interfaceDefinitionType)
+ interfaceMethodDefinition = factory.TypeSystemContext.GetMethodForInstantiatedType(interfaceMethodDefinition.GetTypicalMethodDefinition(), (InstantiatedType)interfaceDefinitionType);


Not sure what GetTypicalMethodDefinition does. I think we get here when we have something like I1<T> and I1<string>, where interfaceType = I1<string>. And if I1<T> looks like

interface I1<T> { void M(T t); }

then interfaceMethod = I1<string>.M(string)?

Then does this statement get I1<T>.M(T t)?

It gets the method definition on the uninstantiated version of the owning type:

I1<!0>.Method.GetTypicalMethodDefinition == I1<T>.Method
I1<object>.Method.GetTypicalMethodDefinition == I1<T>.Method
I1<List<!0>>.Method.GetTypicalMethodDefinition == I1<T>.Method

And also (but it might not be important here):

I1<!0>.Method<!1>.GetTypicalMethodDefinition == I1<T>.Method
I1<object>.Method<!1>.GetTypicalMethodDefinition == I1<T>.Method
I1<List<!0>>.Method<object>.GetTypicalMethodDefinition == I1<T>.Method

agocke · 2023-10-16T23:44:26Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

- defType.ResolveInterfaceMethodToStaticVirtualMethodOnType(interfaceMethod) :
- defType.ResolveInterfaceMethodToVirtualMethodOnType(interfaceMethod);
+ defTypeDefinition.ResolveInterfaceMethodToStaticVirtualMethodOnType(interfaceMethodDefinition) :
+ defTypeDefinition.ResolveInterfaceMethodToVirtualMethodOnType(interfaceMethodDefinition);


So this maps something like C<T> to I1<T>.M(T)?

It would map to I1<!0>.M(!0) (the owning interface is instantiated over the relevant signature variable based on the interface list).

agocke · 2023-10-16T23:45:18Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

+ implType = implType.BaseType;
+
+ if (!implType.IsTypeDefinition)
+ implMethod = factory.TypeSystemContext.GetMethodForInstantiatedType(implMethod.GetTypicalMethodDefinition(), (InstantiatedType)implType);


And very much not sure what we're doing here.

Given:

interface IFoo<T> { void M(T t); } class Foo<T, U> : IFoo { public void M(U t) { } }

And computing the conditional dependencies for Foo<object, object>:

The above logic just resolved that IFoo<!1>.M is implemented by Foo<T, U>.M(U). We now need to find this method on Foo<object, object>. This grabs the typical method definition of Foo<T, U>.M(U) (the same thing we have right now) and finds it on Foo<object, object>, returning Foo<object, object>(object)

This code is here mostly to cover the more interesting case of:

class Base { public void M(U t) { } } class Foo<T, U> : Base, IFoo { }

Where the result of the resolution would be Base<!1>.M and we want Base<object>.M.

MichalStrehovsky · 2023-10-17T07:03:08Z

I'm merging to unblock Viktor but @agocke @sbomer let's continue the review and I'll submit a subsequent PR if there's anything that needs changing.

sbomer · 2023-10-17T16:10:56Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
 if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
 continue;

+ MethodDesc interfaceMethodDefinition = interfaceMethod;
+ if (interfaceType != interfaceDefinitionType)
+ interfaceMethodDefinition = factory.TypeSystemContext.GetMethodForInstantiatedType(interfaceMethodDefinition.GetTypicalMethodDefinition(), (InstantiatedType)interfaceDefinitionType);


Can the result of GetMethodForInstantiatedType be different than the virtual method on interfaceDefinitionType? Wondering why we walk virtual methods for interfaceType instead of interfaceDefinitionType.

I think walking the methods on both forms would be equivalent. But we need both interfaceMethod (in the form of IFoo<!0>.M()) and interfaceMethodDefinition (in the form of IFoo<object>.M()). We have to choose some type to iterate methods on and we're choosing this one...

sbomer · 2023-10-17T16:14:59Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

 for (int interfaceIndex = 0; interfaceIndex < defTypeRuntimeInterfaces.Length; interfaceIndex++)
 {
 DefType interfaceType = defTypeRuntimeInterfaces[interfaceIndex];
+ DefType interfaceDefinitionType = defTypeDefinitionRuntimeInterfaces[interfaceIndex];


nit: would it make sense to call this definitionInterfaceType to differentiate "interface type of the type definition" from "type definition of the interface"? This might make it more clear that for a case like class C : IFoo<object>, this represents IFoo<object>, not IFoo<T> (assuming I understood correctly).

Sounds like a better name. I've opened #93645 to address it here and elsewhere where the same name is used.

sbomer · 2023-10-17T16:30:50Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/EETypeNode.cs

 if (implMethod != null)
 {
+ TypeDesc implType = defType;
+ while (!implType.HasSameTypeDefinition(implMethod.OwningType))


I'm wondering why we need to walk up base types again if we already found implMethod on the base. Is there a way to do this instead by instantiating implMethod.OwningType using the generic context of defType?

So in theory we shouldn't need to. We'd just call implMethod.InstantiateSignature(defType.Instantiation, Instantiation.Empty).

This however doesn't work due to a quirk. I've been thinking about doing something about it but also I don't really want to perturb things too much for the sake of making things prettier.

The problem is this:

The type system has a representation of a generic type definition and "an open type instantiation". One is Foo<T, U>. The other is Foo<!0, !1>. We need to be able to distinguish between these because of stuff like class Foo<T, U> { void Do(T m) { typeof(Foo<>); typeof(Foo<T, U>); } - these typeofs need to produce a different thing. The instantiation in the first case is over generic parameters (GenericParameterDesc) in the former case, and signature variables in the latter case

Now here's the quirk. When iterating methods on these types, we get Do(T m) and Do(!0 m) respectively. Calling InstantiateSignature on Do(T m) doesn't do anything. When one thinks about it, it really better shouldn't do anything on generic parameters because InstantiateSignature is also used to make things like specialized method bodies and that would coalesce the typeof(Foo<>) and typeof(Foo<T, U>) cases into the same thing.

Since we do the resolution on Foo<T, U>, we'll typically get Foo<T, U>.Method back. But sometimes also Base<!0>.Method. InstantiateSignature would work on the latter, but not on the former. So we have a bunch of code written like this that can handle both cases correctly.

class Foo<T, U> { void Do(T m) { typeof(Foo<>); typeof(Foo<T, U>); }

FWIW Roslyn has a similar distinction, but maybe it's subtly different. From Roslyn's perspective, Foo<> is a completely unbound generic, similar to a definition. I believe you'll get the same symbol from that as you would calling OriginalDefinition on any given instantiation of Foo<T, U>.

typeof(Foo<T, U>), on the other hand, is in no way open or unbound in that instance. It's fully instantiated using the type variables T and U. In other words, when looking at the definition of the class class Foo<T, U>, if you look at T and U you will see that they are definition symbols -- they define T and U. If you look at T and U in typeof(Foo<T, U>), however, you'll see that they are type arguments -- they are references to the definitions and would be no different than if you said typeof(Foo<int, string>).

I'm not sure if that helps anything. I think the Roslyn equivalent of something similar to the above would be calling Construct on an already constructed type symbol. Roslyn produces an exception if you try to do this. So maybe that's the same as NAOT. Roslyn would demand that you pass in the original definition before calling Construct.

hez2010 · 2023-10-18T03:12:34Z

Will this be backported to 8.0?

MichalStrehovsky · 2023-10-18T07:22:18Z

Will this be backported to 8.0?

I wasn't planning to since this is also broken on .NET 7 and very likely also on .NET Native and nobody ran into this.

agocke · 2023-10-18T20:51:30Z

I'm fine with backporting if we run into a customer scenario. As Michal said, this doesn't seem super common, so maybe that won't be necessary.

MrJul · 2023-10-19T02:09:52Z

What a timing!

Here's a stripped down repro of a real world problem encountered a few hours ago in an application that uses CommunityToolkit.Mvvm, when publishing as NativeAOT:

using System.Windows.Input;
using CommunityToolkit.Mvvm.Input;

internal static class Program
{
    public static void Main(string[] args)
    {
        var command = (ICommand)Activator.CreateInstance(Type.GetType(args[0]));
        command.Execute(null);
        new RelayCommand<object?>(_ => { });
    }
}

(the Activator.CreateInstance isn't real code, it's simply to get an ICommand whose real type is not known by ILC)

Version without any package dependency

internal static class Program
{
    public static void Main(string[] args)
    {
        var command = (ICommand)Activator.CreateInstance(Type.GetType(args[0]));
        command.Execute(null);
        new RelayCommand<object?>(_ => { });
    }
}

public interface ICommand
{
    void Execute(object? parameter);
}

public interface IRelayCommand<in T> : ICommand
{
    void Execute(T? parameter);
}

public sealed class RelayCommand<T> : IRelayCommand<T>
{
    public RelayCommand(Action<T> execute) { }
    public void Execute(T? parameter) { }
    public void Execute(object? parameter) { }
}

Publish with /p:NativeAot=true. It fails on 7.0 and 8.0-rc2 with an OverflowException in InterfaceDispatchMapNode.EmitDispatchMap since a matching method isn't found anymore in RelayCommand<__Canon> to implement ICommand.Execute(object): GetVirtualMethodSlot returns -1.

To summarize, using a RelayCommand<object> when there's an ICommand.Execute call elsewhere in the program (a very common case in UI frameworks) will prevent an application from being compiled with NativeAOT. In my opinion, this should be backported to 8.0.

MichalStrehovsky · 2023-10-19T07:02:07Z

Note that to actually hit this, the app needs to only have RelayCommand<object> and no other RelayCommand instantiation (i.e. adding if (string.Empty.Length > 0) new RelayCommand<string>(null); anywhere in the above repro will work around the bug). I'd still question how common that is - people have been using native AOT with Avalonia long before native AOT was called native AOT and we've not heard about it. Is there something that changed on Avalonia side that would make this more common now than ever?

The truth is that the failure mode is rather obscure so maybe we'll need to service this just because of that, but I would definitely be interested in knowing how you were able to root cause the compiler crash to the specific problematic interface/class.

hez2010 · 2023-10-19T09:01:31Z

@MichalStrehovsky Avalonia people don't use CommunityToolkit.Mvvm, but WinUI and MAUI people do. Once CsWinRT completes its support for NativeAOT, people will start to encounter this issue commonly IMO.

MrJul · 2023-10-19T10:38:14Z

Note that to actually hit this, the app needs to only have RelayCommand<object> and no other RelayCommand instantiation

Oh, that's the point I was missing, thanks! I couldn't quite believe that it didn't happen more commonly in random apps. Indeed, I expect the issue to be way rarer with this condition added.

Is there something that changed on Avalonia side that would make this more common now than ever?

Nothing has changed in Avalonia in this regard. This was reported on a new application using Avalonia + CommunityToolkit.Mvvm, explaining the single generic instantiation of the RelayCommand<T>. I don't have any stats to back that up, but CommunityToolkit seems to be gaining traction over ReactiveUI in the Avalonia world.

I would definitely be interested in knowing how you were able to root cause the compiler crash to the specific problematic interface/class.

Knowing it was related to RelayCommand<T> (since removing it made the bug disappear) and a related interface (from ilc's stack trace), I started to strip the code down as much as possible. I compiled ilc 8.0-rc2 to be able to debug the exception, where it was quite clear through debugging that Execute(object) wasn't present in RelayCommand<__Canon>.

The next step was trial and error to understand which extra code was triggering the bug. I assumed a second rooted ICommand implementation was needed (it wasn't) and that usages of the RelayCommand<T> were made via the ICommand interface. Simple cases without Avalonia obviously didn't trigger the bug because ilc is quite good at devirtualizing and removing all unneeded members when it can infer the actual type.

agocke · 2023-10-19T17:22:48Z

Sorry, I haven't been fully following the details here. Has someone managed to repro this bug using external code? Either CommunityToolkit or Avalonia?

Because if so, I think that would meet the bar for backporting, mostly because as Michal said, it's difficult to figure out what the problem is.

MrJul · 2023-10-19T17:47:49Z

Has someone managed to repro this bug using external code? Either CommunityToolkit or Avalonia?

Yes, here's some basic reproduction using Avalonia + CommunityToolkit.Mvvm: https://github.com/MrJul/IlcCrashRepro
No convoluted code here, it's basically a hello world with a single button.

dotnet publish it with .NET 8.0-rc2 and observe the IL compiler crashing.

agocke · 2023-10-19T18:09:54Z

OK, this seems worth backporting to me.

agocke · 2023-10-19T18:10:07Z

/backport to release/8.0

github-actions · 2023-10-19T18:10:20Z

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6578564519

MichalStrehovsky added the area-NativeAOT-coreclr label Oct 16, 2023

MichalStrehovsky self-assigned this Oct 16, 2023

MichalStrehovsky mentioned this pull request Oct 16, 2023

[main] Update dependencies from 10 repositories #92881

Merged

Stop building a test

7a28133

MichalStrehovsky commented Oct 16, 2023

View reviewed changes

This was referenced Oct 16, 2023

Compiler crashes when failing to release Mutex #53420

Open

[Test Failure] System.Net.Security.Tests.CertificateValidationClientServer.CertificateValidationClientServer_EndToEnd_Ok on Windows AoT #88832

Closed

agocke reviewed Oct 16, 2023

View reviewed changes

ViktorHofer approved these changes Oct 17, 2023

View reviewed changes

MichalStrehovsky merged commit e68a4cd into dotnet:main Oct 17, 2023
126 of 136 checks passed

MichalStrehovsky deleted the intfanalysis branch October 17, 2023 07:03

sbomer reviewed Oct 17, 2023

View reviewed changes

MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Oct 18, 2023

Address PR feedback from dotnet#93540

1613de2

MichalStrehovsky added a commit that referenced this pull request Oct 18, 2023

Address PR feedback from #93540 (#93645)

0f4196f

github-actions bot mentioned this pull request Oct 19, 2023

[release/8.0] Fix analysis of interface methods on generic types #93748

Merged

MichalStrehovsky mentioned this pull request Nov 6, 2023

Incorrect devirtualization in corner cases #94399

Open

ghost locked as resolved and limited conversation to collaborators Nov 18, 2023

Fix analysis of interface methods on generic types #93540

Fix analysis of interface methods on generic types #93540

Conversation

MichalStrehovsky commented Oct 16, 2023

ghost commented Oct 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichalStrehovsky commented Oct 16, 2023

azure-pipelines bot commented Oct 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichalStrehovsky commented Oct 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hez2010 commented Oct 18, 2023

MichalStrehovsky commented Oct 18, 2023

agocke commented Oct 18, 2023

MrJul commented Oct 19, 2023

MichalStrehovsky commented Oct 19, 2023

hez2010 commented Oct 19, 2023 • edited Loading

MrJul commented Oct 19, 2023

agocke commented Oct 19, 2023

MrJul commented Oct 19, 2023

agocke commented Oct 19, 2023

agocke commented Oct 19, 2023

github-actions bot commented Oct 19, 2023

hez2010 commented Oct 19, 2023 •

edited

Loading