Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix analysis of interface methods on generic types #93540

Merged
merged 2 commits into from
Oct 17, 2023

Conversation

MichalStrehovsky
Copy link
Member

Fixes an issue observed in #92881. The dependency analysis within the compiler was incorrectly considering Equals(object x, object y) to be the implementation of IEqualityComparer<T>.Equals(T, T). When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis.

Looks like this has been broken ever since interface support was added to CoreRT in 2016 and the bug survived additions of default and static interface methods: dotnet/corert#626.

Cc @dotnet/ilc-contrib

Fixes an issue observed in dotnet#92881. The dependency analysis within the compiler was incorrectly considering `Equals(object x, object y)` to be the implementation of `IEqualityComparer<T>.Equals(T, T)`. When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis.

Looks like this has been broken ever since interface support was added to CoreRT: dotnet/corert#626.
@ghost
Copy link

ghost commented Oct 16, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes an issue observed in #92881. The dependency analysis within the compiler was incorrectly considering Equals(object x, object y) to be the implementation of IEqualityComparer<T>.Equals(T, T). When we generate the interface dispatch table, we'd use the correct algorithm (that looks at uninstantiated types) and fail the compilation. The fix is to use the same algorithm during dependency analysis.

Looks like this has been broken ever since interface support was added to CoreRT in 2016 and the bug survived additions of default and static interface methods: dotnet/corert#626.

Cc @dotnet/ilc-contrib

Author: MichalStrehovsky
Assignees: MichalStrehovsky
Labels:

area-NativeAOT-coreclr

Milestone: -

Comment on lines +473 to +475
TypeDesc implType = defType;
while (!implType.HasSameTypeDefinition(implMethod.OwningType))
implType = implType.BaseType;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test I'm disabling is hitting a nullref in this block because we walk past system.object without finding a suitable base (the test is testing invalid IL). We could do something when we hit null here, but I'm not sympathetic to that idea. We already disable the test in R2R testing for similar reasons.

The "proper" fix would be to walk the potential vtable as part of EEType validation in the EETypeNode constructor (so that we throw there and can invalidate whatever method body is causing this) but this is expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is expensive.

By that I mean compiler throughput, not amount of work.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

for (int interfaceIndex = 0; interfaceIndex < defTypeRuntimeInterfaces.Length; interfaceIndex++)
{
DefType interfaceType = defTypeRuntimeInterfaces[interfaceIndex];
DefType interfaceDefinitionType = defTypeDefinitionRuntimeInterfaces[interfaceIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, just to make sure I have this right

For class C<T> : I1<T>, I2<string> and an instance C<string>

defTypeDefinition is the unsubstituted type of the current type, so C<T> for C<string>
defTypeRuntimeInterfaces is the implemented interfaces of the current type, so I1<string>, I2<string>
defTypeDefinitionRuntimeInterfaces is the interfaces of the unsubstituted current type, so I1<T>, I2<string>

interfaceType is the current implemented interfaces, so I1<string> or I2<string>
interfaceTypeDefinitionType is the unsubstituted... equivalent? So I1<T>, I2<string>. Notably I2<string>, right? Not the unsubstituted one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for all of the questions.

@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
continue;

MethodDesc interfaceMethodDefinition = interfaceMethod;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say GVMs aren't tracked here, does GVM mean that the method itself doesn't have generic parameters, or that the containing type also doesn't have generic parameters (or both)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ILCompiler codebase, we exclusively refer to GVMs as methods that have generic parameters themselves.

Methods on generic types (that don't have their own generic method parameters) are not called GVMs here.

@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
continue;

MethodDesc interfaceMethodDefinition = interfaceMethod;
if (interfaceType != interfaceDefinitionType)
interfaceMethodDefinition = factory.TypeSystemContext.GetMethodForInstantiatedType(interfaceMethodDefinition.GetTypicalMethodDefinition(), (InstantiatedType)interfaceDefinitionType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what GetTypicalMethodDefinition does. I think we get here when we have something like I1<T> and I1<string>, where interfaceType = I1<string>. And if I1<T> looks like

interface I1<T>
{
    void M(T t);
}

then interfaceMethod = I1<string>.M(string)?

Then does this statement get I1<T>.M(T t)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets the method definition on the uninstantiated version of the owning type:

I1<!0>.Method.GetTypicalMethodDefinition == I1<T>.Method
I1<object>.Method.GetTypicalMethodDefinition == I1<T>.Method
I1<List<!0>>.Method.GetTypicalMethodDefinition == I1<T>.Method

And also (but it might not be important here):

I1<!0>.Method<!1>.GetTypicalMethodDefinition == I1<T>.Method<U>
I1<object>.Method<!1>.GetTypicalMethodDefinition == I1<T>.Method<U>
I1<List<!0>>.Method<object>.GetTypicalMethodDefinition == I1<T>.Method<U>

defType.ResolveInterfaceMethodToStaticVirtualMethodOnType(interfaceMethod) :
defType.ResolveInterfaceMethodToVirtualMethodOnType(interfaceMethod);
defTypeDefinition.ResolveInterfaceMethodToStaticVirtualMethodOnType(interfaceMethodDefinition) :
defTypeDefinition.ResolveInterfaceMethodToVirtualMethodOnType(interfaceMethodDefinition);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this maps something like C<T> to I1<T>.M(T)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would map to I1<!0>.M(!0) (the owning interface is instantiated over the relevant signature variable based on the interface list).

implType = implType.BaseType;

if (!implType.IsTypeDefinition)
implMethod = factory.TypeSystemContext.GetMethodForInstantiatedType(implMethod.GetTypicalMethodDefinition(), (InstantiatedType)implType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And very much not sure what we're doing here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given:

interface IFoo<T>
{
    void M(T t);
}

class Foo<T, U> : IFoo<U>
{
    public void M(U t) { }
}

And computing the conditional dependencies for Foo<object, object>:

The above logic just resolved that IFoo<!1>.M is implemented by Foo<T, U>.M(U). We now need to find this method on Foo<object, object>. This grabs the typical method definition of Foo<T, U>.M(U) (the same thing we have right now) and finds it on Foo<object, object>, returning Foo<object, object>(object)

This code is here mostly to cover the more interesting case of:

class Base<U>
{
    public void M(U t) { }
}

class Foo<T, U> : Base<U>, IFoo<U>
{
}

Where the result of the resolution would be Base<!1>.M and we want Base<object>.M.

@MichalStrehovsky
Copy link
Member Author

I'm merging to unblock Viktor but @agocke @sbomer let's continue the review and I'll submit a subsequent PR if there's anything that needs changing.

@MichalStrehovsky MichalStrehovsky merged commit e68a4cd into dotnet:main Oct 17, 2023
126 of 136 checks passed
@MichalStrehovsky MichalStrehovsky deleted the intfanalysis branch October 17, 2023 07:03
@@ -457,11 +461,22 @@ public sealed override IEnumerable<CombinedDependencyListEntry> GetConditionalSt
if (!isStaticInterfaceMethod && !needsDependenciesForInstanceInterfaceMethodImpls)
continue;

MethodDesc interfaceMethodDefinition = interfaceMethod;
if (interfaceType != interfaceDefinitionType)
interfaceMethodDefinition = factory.TypeSystemContext.GetMethodForInstantiatedType(interfaceMethodDefinition.GetTypicalMethodDefinition(), (InstantiatedType)interfaceDefinitionType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the result of GetMethodForInstantiatedType be different than the virtual method on interfaceDefinitionType? Wondering why we walk virtual methods for interfaceType instead of interfaceDefinitionType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think walking the methods on both forms would be equivalent. But we need both interfaceMethod (in the form of IFoo<!0>.M()) and interfaceMethodDefinition (in the form of IFoo<object>.M()). We have to choose some type to iterate methods on and we're choosing this one...

for (int interfaceIndex = 0; interfaceIndex < defTypeRuntimeInterfaces.Length; interfaceIndex++)
{
DefType interfaceType = defTypeRuntimeInterfaces[interfaceIndex];
DefType interfaceDefinitionType = defTypeDefinitionRuntimeInterfaces[interfaceIndex];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make sense to call this definitionInterfaceType to differentiate "interface type of the type definition" from "type definition of the interface"? This might make it more clear that for a case like class C : IFoo<object>, this represents IFoo<object>, not IFoo<T> (assuming I understood correctly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a better name. I've opened #93645 to address it here and elsewhere where the same name is used.

if (implMethod != null)
{
TypeDesc implType = defType;
while (!implType.HasSameTypeDefinition(implMethod.OwningType))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why we need to walk up base types again if we already found implMethod on the base. Is there a way to do this instead by instantiating implMethod.OwningType using the generic context of defType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in theory we shouldn't need to. We'd just call implMethod.InstantiateSignature(defType.Instantiation, Instantiation.Empty).

This however doesn't work due to a quirk. I've been thinking about doing something about it but also I don't really want to perturb things too much for the sake of making things prettier.

The problem is this:

The type system has a representation of a generic type definition and "an open type instantiation". One is Foo<T, U>. The other is Foo<!0, !1>. We need to be able to distinguish between these because of stuff like class Foo<T, U> { void Do(T m) { typeof(Foo<>); typeof(Foo<T, U>); } - these typeofs need to produce a different thing. The instantiation in the first case is over generic parameters (GenericParameterDesc) in the former case, and signature variables in the latter case

Now here's the quirk. When iterating methods on these types, we get Do(T m) and Do(!0 m) respectively. Calling InstantiateSignature on Do(T m) doesn't do anything. When one thinks about it, it really better shouldn't do anything on generic parameters because InstantiateSignature is also used to make things like specialized method bodies and that would coalesce the typeof(Foo<>) and typeof(Foo<T, U>) cases into the same thing.

Since we do the resolution on Foo<T, U>, we'll typically get Foo<T, U>.Method back. But sometimes also Base<!0>.Method. InstantiateSignature would work on the latter, but not on the former. So we have a bunch of code written like this that can handle both cases correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class Foo<T, U> { void Do(T m) { typeof(Foo<>); typeof(Foo<T, U>); }

FWIW Roslyn has a similar distinction, but maybe it's subtly different. From Roslyn's perspective, Foo<> is a completely unbound generic, similar to a definition. I believe you'll get the same symbol from that as you would calling OriginalDefinition on any given instantiation of Foo<T, U>.

typeof(Foo<T, U>), on the other hand, is in no way open or unbound in that instance. It's fully instantiated using the type variables T and U. In other words, when looking at the definition of the class class Foo<T, U>, if you look at T and U you will see that they are definition symbols -- they define T and U. If you look at T and U in typeof(Foo<T, U>), however, you'll see that they are type arguments -- they are references to the definitions and would be no different than if you said typeof(Foo<int, string>).

I'm not sure if that helps anything. I think the Roslyn equivalent of something similar to the above would be calling Construct on an already constructed type symbol. Roslyn produces an exception if you try to do this. So maybe that's the same as NAOT. Roslyn would demand that you pass in the original definition before calling Construct.

@hez2010
Copy link
Contributor

hez2010 commented Oct 18, 2023

Will this be backported to 8.0?

MichalStrehovsky added a commit to MichalStrehovsky/runtime that referenced this pull request Oct 18, 2023
@MichalStrehovsky
Copy link
Member Author

Will this be backported to 8.0?

I wasn't planning to since this is also broken on .NET 7 and very likely also on .NET Native and nobody ran into this.

MichalStrehovsky added a commit that referenced this pull request Oct 18, 2023
@agocke
Copy link
Member

agocke commented Oct 18, 2023

I'm fine with backporting if we run into a customer scenario. As Michal said, this doesn't seem super common, so maybe that won't be necessary.

@MrJul
Copy link

MrJul commented Oct 19, 2023

What a timing!

Here's a stripped down repro of a real world problem encountered a few hours ago in an application that uses CommunityToolkit.Mvvm, when publishing as NativeAOT:

using System.Windows.Input;
using CommunityToolkit.Mvvm.Input;

internal static class Program
{
    public static void Main(string[] args)
    {
        var command = (ICommand)Activator.CreateInstance(Type.GetType(args[0]));
        command.Execute(null);
        new RelayCommand<object?>(_ => { });
    }
}

(the Activator.CreateInstance isn't real code, it's simply to get an ICommand whose real type is not known by ILC)

Version without any package dependency
internal static class Program
{
    public static void Main(string[] args)
    {
        var command = (ICommand)Activator.CreateInstance(Type.GetType(args[0]));
        command.Execute(null);
        new RelayCommand<object?>(_ => { });
    }
}

public interface ICommand
{
    void Execute(object? parameter);
}

public interface IRelayCommand<in T> : ICommand
{
    void Execute(T? parameter);
}

public sealed class RelayCommand<T> : IRelayCommand<T>
{
    public RelayCommand(Action<T> execute) { }
    public void Execute(T? parameter) { }
    public void Execute(object? parameter) { }
}

Publish with /p:NativeAot=true. It fails on 7.0 and 8.0-rc2 with an OverflowException in InterfaceDispatchMapNode.EmitDispatchMap since a matching method isn't found anymore in RelayCommand<__Canon> to implement ICommand.Execute(object): GetVirtualMethodSlot returns -1.

To summarize, using a RelayCommand<object> when there's an ICommand.Execute call elsewhere in the program (a very common case in UI frameworks) will prevent an application from being compiled with NativeAOT. In my opinion, this should be backported to 8.0.

@MichalStrehovsky
Copy link
Member Author

Note that to actually hit this, the app needs to only have RelayCommand<object> and no other RelayCommand instantiation (i.e. adding if (string.Empty.Length > 0) new RelayCommand<string>(null); anywhere in the above repro will work around the bug). I'd still question how common that is - people have been using native AOT with Avalonia long before native AOT was called native AOT and we've not heard about it. Is there something that changed on Avalonia side that would make this more common now than ever?

The truth is that the failure mode is rather obscure so maybe we'll need to service this just because of that, but I would definitely be interested in knowing how you were able to root cause the compiler crash to the specific problematic interface/class.

@hez2010
Copy link
Contributor

hez2010 commented Oct 19, 2023

@MichalStrehovsky Avalonia people don't use CommunityToolkit.Mvvm, but WinUI and MAUI people do. Once CsWinRT completes its support for NativeAOT, people will start to encounter this issue commonly IMO.

@MrJul
Copy link

MrJul commented Oct 19, 2023

Note that to actually hit this, the app needs to only have RelayCommand<object> and no other RelayCommand instantiation

Oh, that's the point I was missing, thanks! I couldn't quite believe that it didn't happen more commonly in random apps. Indeed, I expect the issue to be way rarer with this condition added.

Is there something that changed on Avalonia side that would make this more common now than ever?

Nothing has changed in Avalonia in this regard. This was reported on a new application using Avalonia + CommunityToolkit.Mvvm, explaining the single generic instantiation of the RelayCommand<T>. I don't have any stats to back that up, but CommunityToolkit seems to be gaining traction over ReactiveUI in the Avalonia world.

I would definitely be interested in knowing how you were able to root cause the compiler crash to the specific problematic interface/class.

Knowing it was related to RelayCommand<T> (since removing it made the bug disappear) and a related interface (from ilc's stack trace), I started to strip the code down as much as possible. I compiled ilc 8.0-rc2 to be able to debug the exception, where it was quite clear through debugging that Execute(object) wasn't present in RelayCommand<__Canon>.

The next step was trial and error to understand which extra code was triggering the bug. I assumed a second rooted ICommand implementation was needed (it wasn't) and that usages of the RelayCommand<T> were made via the ICommand interface. Simple cases without Avalonia obviously didn't trigger the bug because ilc is quite good at devirtualizing and removing all unneeded members when it can infer the actual type.

@agocke
Copy link
Member

agocke commented Oct 19, 2023

Sorry, I haven't been fully following the details here. Has someone managed to repro this bug using external code? Either CommunityToolkit or Avalonia?

Because if so, I think that would meet the bar for backporting, mostly because as Michal said, it's difficult to figure out what the problem is.

@MrJul
Copy link

MrJul commented Oct 19, 2023

Has someone managed to repro this bug using external code? Either CommunityToolkit or Avalonia?

Yes, here's some basic reproduction using Avalonia + CommunityToolkit.Mvvm: https://github.com/MrJul/IlcCrashRepro
No convoluted code here, it's basically a hello world with a single button.

dotnet publish it with .NET 8.0-rc2 and observe the IL compiler crashing.

@agocke
Copy link
Member

agocke commented Oct 19, 2023

OK, this seems worth backporting to me.

@agocke
Copy link
Member

agocke commented Oct 19, 2023

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/6578564519

@ghost ghost locked as resolved and limited conversation to collaborators Nov 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants