[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

MichalPetryka · 2023-04-18T22:43:57Z

Background and motivation

Roslyn currently caches lambda delegates in static mutable fields in nested classes and initializes them lazily at the use site (guarded by a null check).
This pattern is inefficient for current runtimes, being hard to detect and optimize correctly, leading to dead code being left in generated assembly and making optimizations like delegate inlining more tricky.
Discussion here ended up with two possible solutions: creating a runtime intrinsic that'd create and cache the delegate for Roslyn or caching the delegates in static readonly fields which CoreCLR and Native AOT can analyze easily even today but that'd come with some metadata cost.

cc @jkotas @EgorBo @jaredpar @CyrusNajmabadi

API Proposal

From @jkotas in dotnet/csharplang#6746 (reply in thread):

public class RuntimeHelpers
{
    [Intrinsic]
    // ldftn instruction must immediately precede call of this method. The target method must
    // match the delegate signature and it must be an instance method on the TScope reference. TScope should
    // have no instance fields. Given function pointer can be only mapped to one delegate type using this method.
    static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class, new(), TDelegate: delegate
    {
        // This is mock implementation. This method is always going to be expanded as JIT intrinsic.
        lock (s_table)
        {
            MethodInfo mi = MapEntryPointToMethodInfo(typeof(TScope), ftn);

            if (s_table.TryGetValue(mi, out Delegate del))
                return (TDelegate)del;

            Delegate ret = typeof(TScope).IsCollectible ? 
                 InternalCreateDelegate(typeof(TDelegate), new TScope(), ftn) :
                 InternalCreateFrozenDelegate(typeof(TDelegate), new-frozen TScope(), ftn);
            s_table.Add(mi, ret);
            return (TDelegate)ret;
        }
    }

    static readonly ConditionalWeakTable<MethodInfo, Delegate> s_table;
}

API Usage

This would be used internally by Roslyn when emitting lambdas.

Alternative Designs

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes, but it'd introduce either more metadata bloat or more eager delegate object allocation (we'd either need a separate nested class per lambda or calling one lambda would then create delegates for all the lambdas in the nested class).

Risks

All runtimes including Mono would need to have this implemented.

Tasks:

Refactor delegates to be immutable (see below)
Build a functional-enough prototype
Submit an API proposal for review, incorporating learnings from the prototype

The text was updated successfully, but these errors were encountered:

ghost · 2023-04-18T22:45:04Z

Tagging subscribers to this area: @dotnet/area-system-runtime-compilerservices
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Roslyn currently caches lambda delegates in static mutable fields in nested classes and initializes them lazily at the use site (guarded by a null check).
This pattern is inefficient for current runtimes, being hard to detect and optimize correctly, leading to dead code being left in generated assembly and making optimizations like delegate inlining more tricky.
Discussion here ended up with two possible solutions: creating a runtime intrinsic that'd create and cache the delegate for Roslyn or caching the delegates in static readonly fields which CoreCLR and Native AOT can analyze easily even today but that'd come with some metadata cost.

cc @jkotas @EgorBo @jaredpar @CyrusNajmabadi

API Proposal

From @jkotas in dotnet/csharplang#6746 (reply in thread):

public class RuntimeHelpers
{
    [Intrinsic]
    // ldftn instruction must immediately precede call of this method. The target method must
    // match the delegate signature and it must be an instance method on the TScope reference. TScope should
    // have no instance fields. Given function pointer can be only mapped to one delegate type using this method.
    static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class, TDelegate: delegate
    {
        // This is mock implementation. This method is always going to be expanded as JIT intrinsic.
        lock (s_table)
        {
            MethodInfo mi = MapEntryPointToMethodInfo(typeof(TScope), ftn);

            if (s_table.TryGetValue(mi, out Delegate del))
                return (TDelegate)del;

            Delegate ret = typeof(TScope).IsCollectible ? 
                 InternalCreateDelegate(typeof(TDelegate), new TScope(), ftn) :
                 InternalCreateFrozenDelegate(typeof(TDelegate), new-frozen TScope(), ftn);
            s_table.Add(mi, ret);
            return (TDelegate)ret;
        }
    }

    static readonly ConditionalWeakTable<MethodInfo, Delegate> s_table;
}

API Usage

This would be used internally by Roslyn when emitting lambdas.

Alternative Designs

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes, but it'd introduce either more metadata bloat or more eager delegate object allocation (we'd either need a separate nested class per lambda or calling one lambda would then create delegates for all the lambdas in the nested class).

Risks

All runtimes including Mono would need to have this implemented.

Author:	MichalPetryka
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime.CompilerServices`, `untriaged`
Milestone:	-

jaredpar · 2023-04-18T23:46:27Z

static TDelegate GetLambdaSingleton<TScope, TDelegate>(IntPtr ftn) where TScope: class,

Needs to include the new() constraint.

jaredpar · 2023-04-18T23:51:53Z

This will require a bit of refactoring on Roslyn's part. Today our emit strategy puts all lambdas for a given scope into the same closure type. That means it mixes capturing and non-capturing lambdas. In order to meet the constraints of this API we'll need to break up our closures such that non-capturing lambdas are never mixed with capturing ones.

I think this can be done by having a single closure per method for all of our non-capturing lambdas vs. having up to two per scope. Hopefully that is a bit cheaper given our implementation.

EgorBo · 2023-04-19T00:01:44Z

Have Roslyn store delegates in static readonly fields - would work with CoreCLR and Native AOT, not require any runtime changes

The plan is to allocate them on a frozen (nongc) segment which is not supported for static readonly in JIT today (but it can be done, just wanted to point out that there will be some runtime(jit) work any way).

MichalPetryka · 2023-04-19T00:02:02Z

Today our emit strategy puts all lambdas for a given scope into the same closure type.

It doesn't, it emits a type for all methods from the type.

jaredpar · 2023-04-19T16:54:15Z

It doesn't, it emits a type for all methods from the type.

My bad. It does look like we implemented that optimization already. I thought we had a number of cases where we still combined them with the capturing lambdas but does not appear so.

MichalPetryka · 2023-04-25T22:29:24Z

This should also be used for creation of delegates from method groups.

MichalPetryka · 2023-04-26T00:15:15Z

@jkotas @EgorBo While working on #85349, I've accidentally made delegate creation be regarded as slightly cheaper for the inliner, which triggers huge diffs for all the lambdas with the mostly dead initialization branch. This means that changing this will also trigger huge diffs due to removal of that branch.

tannergooding · 2023-05-26T16:43:54Z

I'll leave this open for the time being, but until I get the green light from Jan and Jared, this will remain an api-suggestion.

jkotas · 2023-05-27T01:20:22Z

In order to be able to allocate delegates as frozen heap, they must be immutable, or more precisely they cannot point to objects that are not allocated on frozen heap.

The delegates are not immutable in CoreCLR today. They contain cached MethodInfo that is initialized lazily:

runtime/src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs

Lines 159 to 207 in b2edecd

    
           if ((_methodBase == null) || !(_methodBase is MethodInfo)) 
        
           { 
        
               IRuntimeMethodInfo method = FindMethodHandle(); 
        
               RuntimeType? declaringType = RuntimeMethodHandle.GetDeclaringType(method); 
        
               // need a proper declaring type instance method on a generic type 
        
               if (declaringType.IsGenericType) 
        
               { 
        
                   bool isStatic = (RuntimeMethodHandle.GetAttributes(method) & MethodAttributes.Static) != (MethodAttributes)0; 
        
                   if (!isStatic) 
        
                   { 
        
                       if (_methodPtrAux == IntPtr.Zero) 
        
                       { 
        
                           // The target may be of a derived type that doesn't have visibility onto the 
        
                           // target method. We don't want to call RuntimeType.GetMethodBase below with that 
        
                           // or reflection can end up generating a MethodInfo where the ReflectedType cannot 
        
                           // see the MethodInfo itself and that breaks an important invariant. But the 
        
                           // target type could include important generic type information we need in order 
        
                           // to work out what the exact instantiation of the method's declaring type is. So 
        
                           // we'll walk up the inheritance chain (which will yield exactly instantiated 
        
                           // types at each step) until we find the declaring type. Since the declaring type 
        
                           // we get from the method is probably shared and those in the hierarchy we're 
        
                           // walking won't be we compare using the generic type definition forms instead. 
        
                           Type? currentType = _target!.GetType(); 
        
                           Type targetType = declaringType.GetGenericTypeDefinition(); 
        
                           while (currentType != null) 
        
                           { 
        
                               if (currentType.IsGenericType && 
        
                                   currentType.GetGenericTypeDefinition() == targetType) 
        
                               { 
        
                                   declaringType = currentType as RuntimeType; 
        
                                   break; 
        
                               } 
        
                               currentType = currentType.BaseType; 
        
                           } 
        
                           // RCWs don't need to be "strongly-typed" in which case we don't find a base type 
        
                           // that matches the declaring type of the method. This is fine because interop needs 
        
                           // to work with exact methods anyway so declaringType is never shared at this point. 
        
                           Debug.Assert(currentType != null || _target.GetType().IsCOMObject, "The class hierarchy should declare the method"); 
        
                       } 
        
                       else 
        
                       { 
        
                           // it's an open one, need to fetch the first arg of the instantiation 
        
                           MethodInfo invoke = this.GetType().GetMethod("Invoke")!; 
        
                           declaringType = (RuntimeType)invoke.GetParameters()[0].ParameterType; 
        
                       } 
        
                   } 
        
               } 
        
               _methodBase = (MethodInfo)RuntimeType.GetMethodBase(declaringType, method)!;

. So the first step is to figure out what to do about this cache. Is it possible to get rid of it and make delegate objects smaller? It should be fine to make Delegate.Method property slower if it makes Delegate objects smaller.

MichalPetryka · 2023-05-27T01:28:55Z

Could the cache maybe use RuntimeMethodHandle.Value instead? Then the field will just be an IntPtr.

jkotas · 2023-05-27T02:24:33Z

Could the cache maybe use RuntimeMethodHandle.Value instead? Then the field will just be an IntPtr.

If we are not going to store MethodInfo directly in the delegate, there needs to be a lookup or mapping operation involved. If we are going to pay for a mapping or lookup operation, we can as well get rid of the field and use the existing fields or the delegate instance as the key for the lookup (notice that the existing implementation is able to compute the MethodInfo from existing fields). Some options:

Cache it using the method pointer as the key. Skip caching for collectible assemblies.
Cache it in ConditionalWeakTable using delegate instance as the key.
Do not cache it at all. Declare Delegate.Method to be a slow operation.

None that the field is overloaded for other purposes, but those other purposes do not look that hard to refactor away.

MichalPetryka · 2023-05-29T13:17:59Z

Cache it in ConditionalWeakTable using delegate instance as the key.

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

EgorBo · 2023-05-29T13:37:12Z

Cache it in ConditionalWeakTable using delegate instance as the key.

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

delegates from collectible assemblies won't be allocated on frozen segments anyway

jkotas · 2023-05-29T13:49:23Z

Would the object that keeps the assembly alive for collectible assemblies also be moved there to keep them alive?

It should be possible to store the object reference that keeps collectible assemblies alive in _invocationList field.

MichalPetryka added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 18, 2023

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 18, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Apr 18, 2023

tannergooding added area-System.Runtime.CompilerServices and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Apr 18, 2023

EgorBo mentioned this issue Apr 20, 2023

Fold IND(frozenObj, CNS) #85127

Merged

EgorBo mentioned this issue May 5, 2023

JIT: Inline static readonly delegates #85803

Closed

tannergooding removed the untriaged New issue has not been triaged by the area owner label May 26, 2023

jkotas added the help wanted [up-for-grabs] Good issue for external contributors label May 27, 2023

This was referenced Feb 28, 2024

[API Proposal]: ArgumentException.ThrowIf method #99035

Closed

Replace Delegate MethodInfo cache with the MethodDesc #99200

Open

stephentoub added this to the Future milestone Jul 19, 2024

MichalPetryka mentioned this issue Nov 10, 2024

Non-PGO calli/delegate call transformation #109679

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

MichalPetryka commented Apr 18, 2023 •

edited by jkotas

Loading

ghost commented Apr 18, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

jaredpar commented Apr 18, 2023

jaredpar commented Apr 18, 2023 •

edited

Loading

EgorBo commented Apr 19, 2023 •

edited

Loading

MichalPetryka commented Apr 19, 2023

jaredpar commented Apr 19, 2023

MichalPetryka commented Apr 25, 2023

MichalPetryka commented Apr 26, 2023

tannergooding commented May 26, 2023

jkotas commented May 27, 2023

MichalPetryka commented May 27, 2023 •

edited

Loading

jkotas commented May 27, 2023

MichalPetryka commented May 29, 2023

EgorBo commented May 29, 2023

jkotas commented May 29, 2023

[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

[API Proposal]: Introduce an intrinsic for more efficient lambda generation #85014

Comments

MichalPetryka commented Apr 18, 2023 • edited by jkotas Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

Tasks:

ghost commented Apr 18, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

jaredpar commented Apr 18, 2023

jaredpar commented Apr 18, 2023 • edited Loading

EgorBo commented Apr 19, 2023 • edited Loading

MichalPetryka commented Apr 19, 2023

jaredpar commented Apr 19, 2023

MichalPetryka commented Apr 25, 2023

MichalPetryka commented Apr 26, 2023

tannergooding commented May 26, 2023

jkotas commented May 27, 2023

MichalPetryka commented May 27, 2023 • edited Loading

jkotas commented May 27, 2023

MichalPetryka commented May 29, 2023

EgorBo commented May 29, 2023

jkotas commented May 29, 2023

MichalPetryka commented Apr 18, 2023 •

edited by jkotas

Loading

jaredpar commented Apr 18, 2023 •

edited

Loading

EgorBo commented Apr 19, 2023 •

edited

Loading

MichalPetryka commented May 27, 2023 •

edited

Loading