-
Notifications
You must be signed in to change notification settings - Fork 5k
Support precompiling CCW vtables in ILC cctor interpreter (Native AOT) #114024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
That sounds like a bit of an interpreter rewrite I would rather avoid. 1
.NET forefathers already knew about a need to emit vtables into executables and the .NET file format has had a representation for "memory block with function pointers in it declared statically" for good 20 years. I would not be opposed to using the existing encoding. One cannot generate it with C#, but it would be possible to introduce it through IL rewriting, or generating a new .NET assembly at build time (through a new tool, or with ILASM, or whatever) and injecting it into compilation. Or try luck getting this into C# proper. In IL it looks like this:
I think this would work well with trimming - same way how RVA-static fields work with trimming in general. The only disadvantage is that on a JIT-based .NET runtime, these fixups are processed eagerly at module load so if there's many of them, it could slow down startup. But it's questionable by how much - I think C++/CLI uses this so probably it's not terrible. It would be super-efficient with AOT without having to rely on an cctor optimization to kick in. Footnotes
|
Yes, it is a significant change, but it keeps the overall flow simple (no new build steps) and it opens an opportunity for interpreting more constructs. If we do not want to do the general support in the interpreter, we may consider a pattern matching of the specific shape used for vtable initialization. It is equivalent of the IL rewriting approach.
If the IL rewriting is an independent step, how would you make it work with IUknown slots that are provided by the runtime? |
I do not think that the trimmer supports vtfixups today. Binaries with vtfixups are treated as IJW binaries in general. vtfixups introduce writeable data section. Writeable data sections in IL binaries are part of IJW feature set that is Windows-specific, not supported by many tools, etc. |
It mostly opens up new opportunities for bugs in the interpreter. It was never written to have support for this and I don't have confidence in retrofitting it - it would ideally be rewritten with the new model in mind, maybe taking small harmless pieces. It's the reason why #84431 (comment) went nowhere too.
This could be a new kind of fixup - looks like there's free bits and we only need one: runtime/src/coreclr/inc/corhdr.h Lines 149 to 154 in bee9dd2
I meant it can be made to work with trimming. Tools may not support it, but it is part of the file format spec, so it's all tool bugs. This could also be made native AOT only feature of CsWinRT. CsWinRT already does things that are only for native AOT. I don't think we'd need to make it work with ILLink as a min bar (if it's only consumed by native AOT). |
#114067 has a sketch of implementation that doesn't handle the ComWrappers.GetIUnknownImpl case (that I would also not be excited to hardcode into the interpreter either). |
Number of IJW-specific bits made it into the file format spec. We support them only for scenarios where IJW is supported in general. I do not think it is interesting to support them outside that. We can clarify this in ECMA augments. |
Thank you both for taking a look at this! This is all very promising 😄 Just to clarify, is my understanding correct then that the intention is to just make things work via this |
I am not fan of this solution. If we do not want to build generic interpreter support to handle this, I think that the next best option is to pattern match static constructors with specific shape in the NAOT compiler. Ie. instead of
Right.
I do not think that this niche feature can ever make it to C# as first-class concept. |
Given all of that:
Then yeah some pattern matching seems completely reasonable to me. To be clear, when I originally suggested relying on the ILC interpreter, I was thinking of some kind of pattern matching as well. As in, just let us know how exactly you'd want us to write code in our static constructors to initialize vtables so that the interpreter can fully fold them into some RVA of some kind. Mentioned this to Michal offline, posting it here too for context: if it helps at all, we'd also be fine only supporting 64 bits.
I mentioned our 3 main examples in my first post, copying it here too for context. We basically have 3 cases:
CCW patterns (click to expand)using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
#pragma warning disable
// 1) Base IUnknown vtable
internal static unsafe class IUnknownImpl
{
public static nint AbiToProjectionVftablePtr { get; } = GetAbiToProjectionVftablePtr();
private static nint GetAbiToProjectionVftablePtr()
{
IUnknownVftbl* vftbl = (IUnknownVftbl*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(IUnknownImpl), sizeof(IUnknownVftbl));
ComWrappers.GetIUnknownImpl(
fpQueryInterface: out *(nint*)&vftbl->QueryInterface,
fpAddRef: out *(nint*)&vftbl->AddRef,
fpRelease: out *(nint*)&vftbl->Release);
return (nint)vftbl;
}
}
// 2) Example vtable with direct assignments
internal static unsafe class IInspectableImpl1
{
public static nint AbiToProjectionVftablePtr { get; } = GetAbiToProjectionVftablePtr();
private static nint GetAbiToProjectionVftablePtr()
{
void** vftbl = (void**)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(IInspectableImpl1), sizeof(void*) * 6);
*(IUnknownVftbl*)vftbl = *(IUnknownVftbl*)IUnknownImpl.AbiToProjectionVftablePtr;
vftbl[3] = (delegate* unmanaged[MemberFunction]<void*, uint*, Guid**, int>)&GetIids;
vftbl[4] = (delegate* unmanaged[MemberFunction]<void*, nint*, int>)&GetRuntimeClassName;
vftbl[5] = (delegate* unmanaged[MemberFunction]<void*, int*, int>)&GetTrustLevel;
return (nint)vftbl;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetIids(void* thisPtr, uint* iidCount, Guid** iids)
{
*iidCount = 0;
*iids = null;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetRuntimeClassName(void* thisPtr, nint* className)
{
*className = default;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetTrustLevel(void* thisPtr, int* trustLevel)
{
*trustLevel = 0;
return 0;
}
}
// 3) Example vtable with vtable assignments
internal static unsafe class IInspectableImpl2
{
public static nint AbiToProjectionVftablePtr { get; } = GetAbiToProjectionVftablePtr();
private static nint GetAbiToProjectionVftablePtr()
{
IInspectableVftbl* vftbl = (IInspectableVftbl*)RuntimeHelpers.AllocateTypeAssociatedMemory(typeof(IInspectableImpl2), sizeof(IInspectableVftbl));
*(IUnknownVftbl*)vftbl = *(IUnknownVftbl*)IUnknownImpl.AbiToProjectionVftablePtr;
vftbl->GetIids = &GetIids;
vftbl->GetRuntimeClassName = &GetRuntimeClassName;
vftbl->GetTrustLevel = &GetTrustLevel;
return (nint)vftbl;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetIids(void* thisPtr, uint* iidCount, Guid** iids)
{
*iidCount = 0;
*iids = null;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetRuntimeClassName(void* thisPtr, nint* className)
{
*className = default;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetTrustLevel(void* thisPtr, int* trustLevel)
{
*trustLevel = 0;
return 0;
}
}
public unsafe struct IUnknownVftbl
{
public delegate* unmanaged[MemberFunction]<void*, Guid*, void**, int> QueryInterface;
public delegate* unmanaged[MemberFunction]<void*, uint> AddRef;
public delegate* unmanaged[MemberFunction]<void*, uint> Release;
}
internal unsafe struct IInspectableVftbl
{
public delegate* unmanaged[MemberFunction]<void*, Guid*, void**, int> QueryInterface;
public delegate* unmanaged[MemberFunction]<void*, uint> AddRef;
public delegate* unmanaged[MemberFunction]<void*, uint> Release;
public delegate* unmanaged[MemberFunction]<void*, uint*, Guid**, int> GetIids;
public delegate* unmanaged[MemberFunction]<void*, nint*, int> GetRuntimeClassName;
public delegate* unmanaged[MemberFunction]<void*, int*, int> GetTrustLevel;
} We can ensure that all our vtables in CsWinRT fall into one of these 3 buckets. |
It is not very straightforward to pattern match a static constructor with function calls. It would be easier to pattern match it if all code is in the constructor itself. No calls to other methods. Also, it would be nice to introduce some gesture to allow the compiler to allocate the vtables in readonly section in the image (good for security defense in depth). The compiler cannot assume that it is readonly with the current shape - there may be code that reassigns slots in the vtable after it was constructed. One possible way to address both these concerns is readonly static field marked with
|
I was just thinking of leveraging I have a couple questions:
[FixedAddressValueType]
[TypeAssociatedMemory(typeof(Foo))]
public static readonly IInspectableVftbl Vtbl; |
It would have to be handled as intrinsic call.
Lifetime of the field is implicitly associated with the containing assembly. Is the type passed to |
We were going to do that for custom mapped types and proxy types in CsWinRT. Eg. here is the I will note that thinking about this more, I'm not convinced we support assembly unloading anyway, given that we have a whole lot of caching of metadata and types into global statics, which we never clear. Meaning all loaded assemblies would remain rooted 🤔 |
Confirmed that CsWinRT already doesn't support assembly unloading anyway, so I think this should be fine for starters. People that really wanted it can also just load everything into a separate ALC, and then that is unloadable on its own (eg. Paint.NET does this for plugins). So basically this approach seems fine for us, at least for a first version. If that makes things simpler for ILC, even better. Out of curiosity, is something like |
Follow up on this too. Are there any special considerations with respect to |
It does not need to be a runtime feature. CsWinRT runtime can create e.g. a conditional weak table to tie the lifetimes together when any of the types involved are unloadable. It is not very different from what the runtime would need to do internally. |
I think so. It is how it works today. |
That all sounds awesome, thank you! It seems we just need to switch CsWinRT to also emit blittable vtable types for all projections and update the codegen for our CCW/IDIC vtables, and then once ILC can make Sorry for all the questions, just trying to make sure we get everything right from the start for 3.0 😄 |
I think it is more than making ComWrappers.GetIUnknownImpl intrinsic. It is about implementing pattern match for the whole thing. |
You wouldn't need rewriting, this can be generated into a completely new .NET assembly either by emitting IL and compiling with ILASM, or emitted with Cecil/System.Reflection.Metadata/whatever.
We can make guarantees about being able to pattern match IL. We cannot make guarantees about pattern matching C#. Whether the cctor interpreter kicks in is always just a bonus. It is not something anyone can rely on. Small changes in Roslyn codegen can make it not kick in. It usually doesn't matter if individual cctor stops being precompiled. If it is a pattern in thousands of cctors it might be more noticeable. For example it would be noticeable if array enumerators stop being precompiled. But those are all tiny cctors so chances are small something would be generated differently. When the cctor gets big enough, chances are new Roslyn optimizations will shuffle things. If we do IL pattern match, you'd still need to emit IL to make it actually reliable. |
I agree with you in general. However, we do depend on pattern matching IL for warnings and correctness in trimmer and AOT, and generally for performance in codegen. This would be just one more place that can regress if Roslyn gets creative with IL that they emit. |
Having to emit IL in a separate assembly for all projections would also significantly make CsWinRT more complex and the transition more difficult for consumers that need to author and package projections. Pattern matching seems much more convenient 😅 |
We're able to do this while skipping over tons of IL. There's only so many ways one can do a call in IL or assign value to a field. We actively skip over 90% of IL opcodes in that analysis; not understanding those opcodes has no effect on our ability to track the local dataflow. A pattern match needs to understand 100% of the body or it doesn't work at all. |
Just to recap, I assume we can narrow down to just two kinds of high level patters?
I've made a smaller snippet with just these two examples: CCW patterns (click to expand)using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
#pragma warning disable
// 1) Base IUnknown vtable
internal static unsafe class IUnknownImpl
{
[FixedAddressValueType]
private static readonly IUnknownVftbl Vftbl;
public static nint AbiToProjectionVftablePtr => (nint)Unsafe.AsPointer(ref Unsafe.AsRef(in Vftbl));
static IUnknownImpl()
{
ComWrappers.GetIUnknownImpl(
fpQueryInterface: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->QueryInterface,
fpAddRef: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->AddRef,
fpRelease: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->Release);
}
}
// 2) Example vtable with direct assignments
internal static unsafe class IInspectableImpl1
{
[FixedAddressValueType]
private static readonly IInspectableVftbl Vftbl;
public static nint AbiToProjectionVftablePtr => (nint)Unsafe.AsPointer(ref Unsafe.AsRef(in Vftbl));
static IInspectableImpl1()
{
Vftbl.QueryInterface = ((IUnknownVftbl*)IUnknownImpl.AbiToProjectionVftablePtr)->QueryInterface;
Vftbl.AddRef = ((IUnknownVftbl*)IUnknownImpl.AbiToProjectionVftablePtr)->AddRef;
Vftbl.Release = ((IUnknownVftbl*)IUnknownImpl.AbiToProjectionVftablePtr)->Release;
Vftbl.GetIids = (delegate* unmanaged[MemberFunction]<void*, uint*, Guid**, int>)&GetIids;
Vftbl.GetRuntimeClassName = (delegate* unmanaged[MemberFunction]<void*, nint*, int>)&GetRuntimeClassName;
Vftbl.GetTrustLevel = (delegate* unmanaged[MemberFunction]<void*, int*, int>)&GetTrustLevel;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetIids(void* thisPtr, uint* iidCount, Guid** iids)
{
*iidCount = 0;
*iids = null;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetRuntimeClassName(void* thisPtr, nint* className)
{
*className = default;
return 0;
}
[UnmanagedCallersOnly(CallConvs = [typeof(CallConvMemberFunction)])]
private static int GetTrustLevel(void* thisPtr, int* trustLevel)
{
*trustLevel = 0;
return 0;
}
}
public unsafe struct IUnknownVftbl
{
public delegate* unmanaged[MemberFunction]<void*, Guid*, void**, int> QueryInterface;
public delegate* unmanaged[MemberFunction]<void*, uint> AddRef;
public delegate* unmanaged[MemberFunction]<void*, uint> Release;
}
internal unsafe struct IInspectableVftbl
{
public delegate* unmanaged[MemberFunction]<void*, Guid*, void**, int> QueryInterface;
public delegate* unmanaged[MemberFunction]<void*, uint> AddRef;
public delegate* unmanaged[MemberFunction]<void*, uint> Release;
public delegate* unmanaged[MemberFunction]<void*, uint*, Guid**, int> GetIids;
public delegate* unmanaged[MemberFunction]<void*, nint*, int> GetRuntimeClassName;
public delegate* unmanaged[MemberFunction]<void*, int*, int> GetTrustLevel;
} It feels slightly messy around converting to Would these two be something ILC would be able to handle, provided we always tried to follow this exact pattern? |
The interpreter doesn't have a linear address space. We cannot obtain an nint value of a ref and convert back and forth between them, except if it's a very limited pattern match. Crossmethod is not limited patternmatch. At most what would work is what Jan had above. public static unsafe class IInspectableImpl
{
[FixedAddressValueType]
public static readonly IInspectableVftbl Vtbl;
static IInspectableImpl()
{
fixed (IInspectableVftbl* pVtbl = &Vtbl)
{
*(IUnknownVftbl*)pVtbl = IUnknownImpl.Vtbl;
}
Vtbl.GetIids = &GetIids;
Vtbl.GetRuntimeClassName = &GetRuntimeClassName;
Vtbl.GetTrustLevel = &GetTrustLevel;
} Even the part with reading from For this: ComWrappers.GetIUnknownImpl(
fpQueryInterface: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->QueryInterface,
fpAddRef: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->AddRef,
fpRelease: out *(nint*)&((IUnknownVftbl*)Unsafe.AsPointer(ref Vftbl))->Release); This can at best be: ComWrappers.GetIUnknownImpl(out var foo, out var bar, out var baz);
Vtbl.Foo = foo;
Vtbl.Bar = bar;
Vtbl.Baz = baz; Yes, C# compiler could potentially optimize away the locals. It would be a big problem if it does it. This would all be fragile, it would be preferable to hand emit the IL instead of hoping C# compiler generates something sensible. |
Correct me if I'm wrong, it seems like the main problematic thing here is only how to handle public static void GetIUnknownImpl(void* lpVftbl); So then the above would just be: public static unsafe class IInspectableImpl
{
[FixedAddressValueType]
public static readonly IInspectableVftbl Vtbl;
static IInspectableImpl()
{
fixed (IInspectableVftbl* pVtbl = &Vtbl)
{
ComWrappers.GetIUnknownImpl(pVtbl);
}
Vtbl.GetIids = &GetIids;
Vtbl.GetRuntimeClassName = &GetRuntimeClassName;
Vtbl.GetTrustLevel = &GetTrustLevel;
}
} Or perhaps even: public static void GetIUnknownImpl<T>(ref vftbl) where T : unmanaged; public static unsafe class IInspectableImpl
{
[FixedAddressValueType]
public static readonly IInspectableVftbl Vtbl;
static IInspectableImpl()
{
ComWrappers.GetIUnknownImpl(ref Vtbl);
Vtbl.GetIids = &GetIids;
Vtbl.GetRuntimeClassName = &GetRuntimeClassName;
Vtbl.GetTrustLevel = &GetTrustLevel;
}
} Which might simplify things even further and make ILC's life easier? |
If we want to stick with the existing APIs, it would be best to switch to IntPtrs for the vtable slots to make the casts go away. Side question: Do you happen to know how much perf/size (without NAOT) is CsWinRT leaving on table by using the vtable definitions with precise pointers? I know we have discussed the overhead of precise vtable definitions at one point, but I am not sure whether it went anywhere.
This is an API with a questionable shape, and it takes address that is more complicated than it needs to be. If we are talking about introducing new APIs, I think the following would be easier for reliable pattern matching:
|
Mmh hold on, not entirely sure I'm following. How would using precise definitions in the vtable leave any performance on the table? Wouldn't the only difference be that you'd be skipping the cast to the right function pointer when actually using it? I assumed that'd just basically be a no-op anyway, since pointer casts are just like a reinterpret cast? Can you elaborate? 😅
That makes way more sense, right (I was trying to remain somewhat close to the original shape but I can see it's not really a good idea here). I suppose with those 3 APIs it should be hopefully much simpler for ILC to handle things as we could ensure all static constructors would only have a linear sequence of direct assignments to vtable slots? Is this a path you'd be ok with pursuing? If so I can open an API proposal for that 🙂 @MichalStrehovsky would something like this also hopefully address your concerns about pattern matching being too brittle? |
Would this also remove the part where we need to copy vtable slots from a "base" vtable? (the If the cctor ends up looking like: Vtbl.QueryInterface = ComWrappers.GetIUnknownQueryInterfaceImpl;
Vtbl.AddRef = ComWrappers.GetIUnknownAddRefImpl;
Vtbl.Release = ComWrappers.GetIUnknownReleaseImpl;
Vtbl.GetIids = &GetIids;
Vtbl.GetRuntimeClassName = &GetRuntimeClassName;
Vtbl.GetTrustLevel = &GetTrustLevel; This should be relatively reliable. We'll want to impose some restrictions on the type of the |
Yes for // In CsWinRT
public static class IInspectableImpl
{
public static delegate* unmanaged[MemberFunction]<void*, int*, Guid*, int> GetIInspectableGetIIdsImpl()
=> &GetIIds;
} // In some CCW vtable
Vtbl.QueryInterface = ComWrappers.GetIUnknownQueryInterfaceImpl;
Vtbl.AddRef = ComWrappers.GetIUnknownAddRefImpl;
Vtbl.Release = ComWrappers.GetIUnknownReleaseImpl;
Vtbl.GetIids = IInspectableImpl.GetIInspectableGetIIdsImpl();
Vtbl.GetRuntimeClassName = IInspectableImpl.GetIInspectableGetRuntimeClassNameImpl();
Vtbl.GetTrustLevel = IInspectableImpl.GetIInspectableGetTrustLevelImpl();
Vtbl.SomeOtherMethod = &SomeOtherMethod; Would this work? |
Right, casts between function pointer, pointers and nint/nuint are no-ops in IL. It means that there is no difference between storing a function pointer in a field with proper type vs. storing it in a field of IntPtr type. |
I hadn't considered this. This makes me think, what if we just always used private static class SomeTypeImpl
{
[FixedAddressValueType];
private static readonly InlineArray8<nint> Vftbl;
static SomeTypeImpl()
{
Vtbl[0] = ComWrappers.GetIUnknownQueryInterfaceImpl;
Vtbl[1] = ComWrappers.GetIUnknownAddRefImpl;
Vtbl[2] = ComWrappers.GetIUnknownReleaseImpl;
Vtbl[3] = IInspectableImpl.GetIInspectableGetIIdsImpl();
Vtbl[4] = IInspectableImpl.GetIInspectableGetRuntimeClassNameImpl();
Vtbl[5] = IInspectableImpl.GetIInspectableGetTrustLevelImpl();
Vtbl[6] = (nint)(delegate* unmanaged[MemberFunction]<void*, int, int>)&Foo;
Vtbl[7] = (nint)(delegate* unmanaged[MemberFunction]<void*, int>)&Bar;
}
} |
It depends on what the method body of
This would be a complication for the interpreter again. The interpreter represents structs as byte arrays. Each byte of the struct is simply an element of the array. If the struct needs to store function pointers, we cannot use a byte array. We don't know the numerical value of the pointer (the numerical value is only known at runtime once ASLR places the code on a random location in the address space). The interpreter needs to represent this as a special magic struct to be able to do this. One cannot do things with this struct that can be done with other structs (one cannot use the nint number and do math with it or cast it to a long or whatever). The interpreter can use the special magic struct representation if the struct is unambiguously a vtable (for example like I wrote above - sequential layout, every field is a function pointer). But if you just do nints and want to be able to store numbers in an nint in one part of the program and pointer in another part of the program (we already preinitialize nints), you're asking for an interpreter rewrite. It cannot be done in the current model. We need to be able to tell ahead of time whether the struct should be modeled as an array of bytes or array of function pointers. |
I sketched out a prototype as I was looking at this earlier today: MichalStrehovsky@adafd94 |
My idea is it'd just literally do |
Follow up question on this. @MichalStrehovsky if ILC will have special handling for vtable structures, meaning the entire thing will just become a constant RVA-like blob, is it fair to assume the entire vtable type to also be trimmed? Meaning, should we actually be worried about these false dependencies here, or can we just have 100% accurate vtable signatures and then just rely on the ILC folding all the vtables and trimming all that metadata anyway? Because otherwise if not, and if this is still a concern, I'm wondering if we shouldn't do something ugly like, say, always just use some dummy type for all vtable slots (say, Thoguhts? |
These sorts of false dependencies should not be a problem for NAOT (assuming that the vtable types won't be enabled for reflection by accident). They are more of a problem for regular IL trimming since the IL trimmer is not able to make types disappear completely. |
Ah, I see, awesome then. In that case I suppose we can be as specific as needed, if it's not a problem for NAOT 🙂 |
Opened #114133 to track the API proposal for the |
Re:
Just to triple check: this is also not a concern for Native AOT and we can just generate separate vtables (one per native interface) if it's more convenient, right? As in, 100% of these types will just completely vanish when ILC converts the vtable to RVA blobs anyway? |
Yes, unless the compiler it made to believe these are targets of reflection or boxed. The types and fields might still be visible in debug info, but that's a good thing I assume. |
Perfect, thank you! 😄 I suppose the only remaining potential concern then will be to make sure that calls to our own Related question: would there be no way for us to validate the whole thing works other than just inspecting MSTAT files or disassembling the native binary? I probably know what you'll say to this, but: I remember there was an internal attribute one could add to a static constructor (or some member) to make ILC emit a warning/error in case it failed to fully interpret it and trim it. Would that be something we could use this for our vtables, to enforce/ensure that they're always recognized and fully optimized, and to make sure we spot accidental regressions? I assume you wouldn't want to make that attribute public in the BCL anyway? 😅 |
We do not have attributes that require optimizations today. I do not think we want to introduce attribute like that. Optimizations are not part of our compatibility guarantees, and we want to have freedom to change how we do optimizations. Measuring performance characteristics of your build output is the best way to validate that you have not picked up accidental regression. |
@jkotas would you be in favor of updating public static class SomeTypeImpl
{
[FixedAddressValueType];
private static readonly SomeTypeVtbl Vftbl;
public static nint Vftbl => (nint)Unsafe.AsPointer(in Vftbl);
static SomeTypeImpl()
{
// Initialize vtable
}
} @tannergooding mentioned he could help with the proposal, but wanted to double check with you first. Thank you! 🙂 |
Why do you think it would help with the proposal? At minimum, it is yet another call that the interpreter would have to handle as an intrinsic. Likely create a bunch of other complications depending how you would like to use exactly. |
Ah I see, it is for the |
Sorry, yeah, to clarify, I meant to say Tanner mentioned he would help with the
Sweet, thank you! Will open a separate issue then 🙂 |
Related to #79204
As we're migrating the Microsoft Store to Native AOT, and also adopting Native AOT in Windows components, we're investigating all opportunities to improve performance and reduce size, as well as regressions from things that were handled better on .NET Native. One area that seems to be quite impactful due to the way the WinRT projections and interop stack work, is the codegen around CCW vtables. Currently, ILC is not able to fold these into constant blobs, meaning that we pay the initialization cost + the static constructor (and the cctor checks on every access) for:
These add up to thousands of types, so there's a pretty good opportunity for improvements. .NET Native handled this via some special logic to initialize constant blobs for vtables, which Native AOT doesn't have. However, my thinking is that we could simply extend the support in ILC to interpret static constructor by making a couple of required APIs intrinsics, and making sure it can recognize common patterns.
For APIs that would need to be handled as intrinsics by ILC:
ComWrappers.GetIUnknownImpl
RuntimeHelpers.AllocateTypeAssociatedMemory
Next to this, it should also be able to handle common patterns around:
[UnmanagedCallersOnly]
method addresses to vtable slotsIdeally, ILC should be able to fold all of these vtables into a constant blob (like an RVA span field), and trim the cctor entirely.
Our expectation is that this should be doable because Native AOT can rely on the fact that:
[UnmanagedCallersOnly]
method (or, any method in general) should be a constantComWrappers.GetIUnknownImpl
should also be a constant that can be hardcoded by ILCCommon patterns
I've prepared snippet with the common patterns we'd need handling:
IUnknown
vtableCCW patterns (click to expand)
The text was updated successfully, but these errors were encountered: