Replace Delegate MethodInfo cache with the MethodDesc #99200

MichalPetryka · 2024-03-03T01:50:15Z

First attempt at making delegate GC fields immutable in CoreCLR so that they can be allocated on the NonGC heap.

I've checked it with a simple app and corerun locally with a delegate from an unloadable ALC and it seemed to not crash, assert nor unload the ALC from under the delegate, however I couldn't actually find any runtime tests that would verify delegates from unloadable ALCs work so the CI coverage might be missing.

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

Contributes to #85014.

cc @jkotas

MichalPetryka · 2024-03-04T00:01:32Z

I'm not sure what's up with the failures here, tests that are failing on the CI seem to pass on my machine.
EDIT: I was testing with R2R disabled locally since VS kept complaining about being unable to load PDBs for it.

src/coreclr/vm/object.cpp

src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

AndyAyersMS · 2024-03-05T22:15:03Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2024-03-05T22:15:18Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas · 2024-03-05T22:33:41Z

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

jkotas · 2024-03-05T22:48:32Z

One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.

The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

MichalPetryka · 2024-03-05T22:52:56Z

Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions.

I think the thing that'd need benchmarking here are equality checks and maybe the impact of collectible delegates being stored in the CWT on the GC, the rest of things shouldn't be performance sensitive enough to matter I think? I'm not fully sure what'd be the proper way for benchmarking the latter.

This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize Delegate_InternalEqualMethodHandles instead.

I am going to benchmark the impact of the equality change tomorrow, I feel like if the impact won't be big, potential optimizations to it can be done later.

jkotas · 2024-03-05T23:02:52Z

I'm not fully sure what'd be the proper way for benchmarking the latter.

Write a small program that loads an assembly as collectible and calls a method in it. The method in collectible assembly can create delegates in a loop. (If you would like to do it under benchmarknet, it works too - but it is probably more work.)

MichalPetryka · 2024-10-18T01:17:37Z

We should be talking numbers. Micro-benchmark numbers for affected methods and also how often are the affected methods called in real-world.

I've benchmarked the APIs, had to massage the code a bit to make the JIT happy though:
Equals (cached means that Method was called):


BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 9 7900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JVFCER : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XPKMUE : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Job	Toolchain	Mean	Error	StdDev	Ratio	RatioSD	Code Size	Allocated	Alloc Ratio
StaticUncachedUncached	Job-JVFCER	PR	4.316 ns	0.1073 ns	0.2788 ns	0.14	0.01	1,161 B	-	NA
StaticUncachedUncached	Job-XPKMUE	Main	31.299 ns	0.3014 ns	0.2672 ns	1.00	0.01	1,230 B	-	NA

StaticCachedCached	Job-JVFCER	PR	3.979 ns	0.1002 ns	0.2221 ns	0.74	0.04	1,161 B	-	NA
StaticCachedCached	Job-XPKMUE	Main	5.379 ns	0.1277 ns	0.1255 ns	1.00	0.03	1,233 B	-	NA

InstanceUncachedUncached	Job-JVFCER	PR	4.892 ns	0.1166 ns	0.1296 ns	0.14	0.00	1,151 B	-	NA
InstanceUncachedUncached	Job-XPKMUE	Main	36.004 ns	0.7288 ns	0.7798 ns	1.00	0.03	1,212 B	-	NA

InstanceCachedCached	Job-JVFCER	PR	5.057 ns	0.1318 ns	0.3759 ns	0.92	0.07	1,151 B	-	NA
InstanceCachedCached	Job-XPKMUE	Main	5.522 ns	0.1119 ns	0.1569 ns	1.00	0.04	1,225 B	-	NA

StaticUncachedCached	Job-JVFCER	PR	3.628 ns	0.0808 ns	0.0716 ns	0.12	0.00	1,161 B	-	NA
StaticUncachedCached	Job-XPKMUE	Main	31.057 ns	0.3800 ns	0.3554 ns	1.00	0.02	1,230 B	-	NA

StaticCachedUncached	Job-JVFCER	PR	3.569 ns	0.0495 ns	0.0463 ns	0.11	0.00	1,161 B	-	NA
StaticCachedUncached	Job-XPKMUE	Main	31.897 ns	0.1509 ns	0.1338 ns	1.00	0.01	1,254 B	-	NA

InstanceUncachedCached	Job-JVFCER	PR	4.276 ns	0.1043 ns	0.1958 ns	0.13	0.01	1,151 B	-	NA
InstanceUncachedCached	Job-XPKMUE	Main	32.738 ns	0.3142 ns	0.2624 ns	1.00	0.01	1,212 B	-	NA

InstanceCachedUncached	Job-JVFCER	PR	4.654 ns	0.1103 ns	0.2490 ns	0.14	0.01	1,151 B	-	NA
InstanceCachedUncached	Job-XPKMUE	Main	34.013 ns	0.6977 ns	0.9315 ns	1.00	0.04	1,237 B	-	NA

GetHashCode:


BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 9 7900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JVFCER : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XPKMUE : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Job	Toolchain	Mean	Error	StdDev	Ratio	Code Size	Allocated	Alloc Ratio
Static	Job-JVFCER	PR	2.198 ns	0.0043 ns	0.0041 ns	0.50	672 B	-	NA
Static	Job-XPKMUE	Main	4.372 ns	0.0058 ns	0.0051 ns	1.00	860 B	-	NA

Instance	Job-JVFCER	PR	3.534 ns	0.0140 ns	0.0131 ns	0.59	808 B	-	NA
Instance	Job-XPKMUE	Main	5.997 ns	0.0078 ns	0.0073 ns	1.00	883 B	-	NA

Method:


BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5011/22H2/2022Update)
AMD Ryzen 9 7900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-rc.2.24474.11
  [Host]     : .NET 9.0.0 (9.0.24.47305), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-JVFCER : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XPKMUE : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	Job	Toolchain	Mean	Error	StdDev	Ratio	RatioSD	Code Size	Allocated	Alloc Ratio
Static	Job-JVFCER	PR	9.305 ns	0.0154 ns	0.0144 ns	5.55	0.02	353 B	-	NA
Static	Job-XPKMUE	Main	1.677 ns	0.0078 ns	0.0069 ns	1.00	0.01	1,556 B	-	NA

Instance	Job-JVFCER	PR	9.317 ns	0.0189 ns	0.0177 ns	5.54	0.03	353 B	-	NA
Instance	Job-XPKMUE	Main	1.682 ns	0.0098 ns	0.0091 ns	1.00	0.01	1,556 B	-	NA

MichalPetryka · 2024-10-19T01:14:26Z

I'm not sure if the System.Text.Json failures here are related, I saw it here before with Windows x86 Debug and now it's on OSX x64 Debug, couldn't find an issue for them either.

MichalPetryka · 2024-10-19T16:51:36Z

For context, my only motivation with this is unblocking #85014 cause I already got non PGO delegate inlining mostly working locally, but it currently only handles static readonly delegates and requires that to work for lambdas and #108606 and #108579 to work better with delegates in general. (I also don't have the NativeAOT VM implementation for my new JIT-EE API finished yet)

jkotas

Nit: The PR title is not correct - this change is not making the delegates immutable anymore.

src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs

jkotas · 2024-10-22T17:28:16Z

src/coreclr/vm/comdelegate.cpp

@@ -1725,8 +1729,10 @@ extern "C" void QCALLTYPE Delegate_Construct(QCall::ObjectHandleOnStack _this, Q
    if (COMDelegate::NeedsWrapperDelegate(pMeth))
        refThis = COMDelegate::CreateWrapperDelegate(refThis, pMeth);

+    refThis->SetMethodDesc(pMethOrig);


I would leave it out from this PR. It is hard to see whether these places use the right MethodDesc in all corner cases.

Also, it is very incomplete, so it won't enable the JIT to do anything interesting with the MethodDesc either.

jkotas · 2024-10-23T23:29:40Z

@EgorBot -intel

using System;
using System.Diagnostics;
using System.Reflection.Emit;
using BenchmarkDotNet.Attributes;

public class Bench
{
    [Benchmark]
    public void DummyDynamicMethod()
    {
        DynamicMethod dm = new DynamicMethod("Dummy", typeof(void), null);

        ILGenerator il = dm.GetILGenerator();
        il.Emit(OpCodes.Ret);

        Action a = dm.CreateDelegate<Action>();
        a();
    }
}

jkotas · 2024-10-24T00:00:28Z

I am hesitant to merge this after looking into the perf impact some more. This change makes reflection and dynamic methods measurably more expensive that is going to show up. EgorBot job that I have just submitted is an example of a regression. I would like to have something to show for these regressions. We can look at this as part of a larger change that has improvements too.

MichalPetryka · 2024-10-24T00:20:46Z

I am hesitant to merge this after looking into the perf impact some more. This change makes reflection and dynamic methods measurably more expensive that is going to show up. EgorBot job that I have just submitted is an example of a regression. I would like to have something to show for these regressions. We can look at this as part of a larger change that has improvements too.

I assume the regression with DynamicMethods comes from StoreDynamicMethod now adding stuff to the CWT which causes more GC pressure to be added by the DependentHandles created for that. This could potentially be a benchmark-only issue since I'm not sure how often do users create millions of DynamicMethods at once like BDN would here.
One thing I could think of to workaround this would be overloading _invovationList even more and storing the DynamicMethod there, that'd make the IsLogicallyNull checks more expensive though. Not sure if not storing it in StoreDynamicMethod at all and then lazily fetching it would work.

As for the reflection part, I'd assume fetching .Method being 10ns vs 2ns wouldn't really matter in the big scale, I'd assume pretty much any operation using MethodInfo to dwarf that in cost.

MichalPetryka · 2024-10-24T00:25:27Z

Another thing to note here, as far as I can see, NativeAOT doesn't seem to cache the MethodInfo at all and seems to fetch it on every call. In such case, I'd assume Reflection perf regression to be even less of an issue since it's much worse there already.

jkotas · 2024-10-24T17:53:30Z

the regression with DynamicMethods comes from StoreDynamicMethod now adding stuff to the CWT

I have seen large part of the regressions coming from indirect effects of large CWT. It makes the GC to do more work, makes GC pause times worse.

NativeAOT

Yes, native AOT reflection implementation has number of perf issues.

MichalPetryka · 2024-10-24T18:10:47Z

@EgorBot -intel

using System;
using System.Diagnostics;
using System.Reflection.Emit;
using BenchmarkDotNet.Attributes;

public class Bench
{
    [Benchmark]
    public void DummyDynamicMethod()
    {
        DynamicMethod dm = new DynamicMethod("Dummy", typeof(void), null);

        ILGenerator il = dm.GetILGenerator();
        il.Emit(OpCodes.Ret);

        Action a = dm.CreateDelegate<Action>();
        a();
    }
}

MichalPetryka · 2024-10-24T21:04:01Z

Seems the overhead is gone after moving the DynamicMethod to _invocationList.

MichalPetryka · 2024-10-28T15:08:05Z

I am hesitant to merge this after looking into the perf impact some more. This change makes reflection and dynamic methods measurably more expensive that is going to show up. EgorBot job that I have just submitted is an example of a regression. I would like to have something to show for these regressions. We can look at this as part of a larger change that has improvements too.

Since the DynamicMethod overhead is gone now, I think only the reflection overhead is left. I personally can't think of any way that lets us avoid that while still making delegates possible to place on FOH (other than putting the MethodInfos on FOH but that seems even harder to do), do you maybe have any ideas for solving that that don't require rewriting delegates as a whole?
If there are no alternatives that remove regressions from this, I'd prefer then to justify those by the benefits stemming from making delegates be placed on FOH and being better inlineable for lambdas, especially in AOT scenarios than to give up.

jkotas · 2024-10-28T15:21:42Z

I'd prefer then to justify those by the benefits stemming from making delegates be placed on FOH and being better inlineable for lambdas

I understand why change like this is necessary for placing delegates on FOH, but I do not think it helps with inlineability.

MichalPetryka · 2024-10-28T15:27:13Z

I'd prefer then to justify those by the benefits stemming from making delegates be placed on FOH and being better inlineable for lambdas

I understand why change like this is necessary for placing delegates on FOH, but I do not think it helps with inlineability.

Ah sorry, I've meant that replacing the current field caching scheme used by Roslyn with #85014 would help with that. So this indirectly helps by unlocking it cause we need to put them on FOH for that.

jkotas · 2024-10-28T15:32:58Z

So this indirectly helps by unlocking it cause we need to put them on FOH for that.

I do not think it is strictly required to put the delegates to FOH. For throughput, the difference is one indirection when loading the delegate instance that is not going to make a significant difference. In any case, the scheme from #85014 would need a path that does not put the delegates on FOH to handle collectible assemblies.

MichalPetryka · 2024-10-28T15:39:55Z

In any case, the scheme from #85014 would need a path that does not put the delegates on FOH to handle collectible assemblies.

Couldn't we alloc a separate FOH segment for each unloadable context?

jkotas · 2024-10-28T15:43:08Z

I do not see how it would work. Delegates for lambdas in collectible assemblies have to be tracked by the GC. They must keep the collectible assembly alive.

MichalPetryka · 2024-10-28T15:51:06Z

I do not see how it would work. Delegates for lambdas in collectible assemblies have to be tracked by the GC. They must keep the collectible assembly alive.

Ah right, forgot about that. As far as I've seen, strings currently deal with collectible assemblies by allocating a pinned handle for every string. Would we want to do something similar or where would they be stored on the normal heap for them?

jkotas · 2024-10-28T16:30:03Z

a pinned handle for every string.

Nit: It is a pinned heap handle. It is not a regular pinned handle.

Would we want to do something similar

Probably.

MichalPetryka · 2024-10-28T22:59:21Z

@jkotas I was thinking, could we get rid of _invocationCount with this and move its content to _methodPtrAux and _methodDesc now? We could rely on the fact that we won't see valid pointers under ushort.MaxValue and use that to recognize if we have a pointer or some value there.

jkotas · 2024-10-29T01:56:56Z

We could rely on the fact that we won't see valid pointers under ushort.MaxValue and use that to recognize if we have a pointer or some value there.

Pointers can be as small as 4k on the supported platforms. I agree that there may be ways to make delegate smaller on CoreCLR, but I do not think that this specific trick would work well.

MichalPetryka · 2024-10-29T02:23:31Z

We could rely on the fact that we won't see valid pointers under ushort.MaxValue and use that to recognize if we have a pointer or some value there.

Pointers can be as small as 4k on the supported platforms. I agree that there may be ways to make delegate smaller on CoreCLR, but I do not think that this specific trick would work well.

Ah right, unmanaged delegates use the Aux field already, that makes stuff problematic.

Make delegates immutable

5b3126e

dotnet-issue-labeler bot added the area-VM-coreclr label Mar 3, 2024

MichalPetryka and others added 3 commits March 3, 2024 02:53

Add a comment

64baa1d

Optimize cache checks

a489288

Merge branch 'main' into immutable-delegates

b0b31bd

build-analysis bot mentioned this pull request Mar 3, 2024

LibraryImportGenerator unit test segfault illegal memory access #89054

Closed

Fix x86

bc132e4

Fix R2R

2dce6a8

runfoapp bot mentioned this pull request Mar 4, 2024

Methodical_others test JIT/Methodical/Coverage/copy_prop_byref_to_native_int crashing #69832

Open

MichalPetryka added 2 commits March 4, 2024 16:30

Update methodcontext.cpp

00fcd02

Fix contract

7941123

jkotas reviewed Mar 4, 2024

View reviewed changes

src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved

jkotas reviewed Mar 4, 2024

View reviewed changes

src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs Outdated Show resolved Hide resolved

MichalPetryka and others added 3 commits March 4, 2024 16:40

Update src/coreclr/vm/object.cpp

4de2754

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Swap fields

b9f7f14

Reorder asserts

90c6085

MichalPetryka mentioned this pull request Mar 4, 2024

When min major R2R version moves forward, consider some cleanups #98871

Open

MichalPetryka added 2 commits March 4, 2024 19:12

Fix contract again

9509023

GCPROTECT

557bf4d

MichalPetryka marked this pull request as ready for review March 4, 2024 23:28

build-analysis bot mentioned this pull request Mar 5, 2024

Tracking issue for CI build timeouts #76454

Closed

MichalPetryka added 2 commits March 5, 2024 21:12

Merge branch 'main' into immutable-delegates

1eab602

Merge branch 'main' into immutable-delegates

477a3c5

jkotas reviewed Oct 22, 2024

View reviewed changes

MichalPetryka changed the title ~~Make delegates immutable~~ Replace Delegate MethodInfo cache with the MethodDesc Oct 22, 2024

MichalPetryka added 2 commits October 22, 2024 19:43

Merge remote-tracking branch 'upstream/main' into immutable-delegates

d593b08

Apply feedback

26b6e61

EgorBot mentioned this pull request Oct 23, 2024

EgorBot for jkotas in #99200 EgorBot/runtime-utils#131

Open

Overload invocationList for DynamicMethods

99e37d6

EgorBot mentioned this pull request Oct 24, 2024

EgorBot for MichalPetryka in #99200 EgorBot/runtime-utils#133

Open

build-analysis bot mentioned this pull request Oct 24, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

build-analysis bot mentioned this pull request Oct 24, 2024

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Delegate MethodInfo cache with the MethodDesc #99200

Replace Delegate MethodInfo cache with the MethodDesc #99200

MichalPetryka commented Mar 3, 2024 •

edited

Loading

MichalPetryka commented Mar 4, 2024 •

edited

Loading

AndyAyersMS commented Mar 5, 2024

azure-pipelines bot commented Mar 5, 2024

jkotas commented Mar 5, 2024

jkotas commented Mar 5, 2024

MichalPetryka commented Mar 5, 2024 •

edited

Loading

jkotas commented Mar 5, 2024

MichalPetryka commented Oct 18, 2024 •

edited

Loading

MichalPetryka commented Oct 19, 2024

MichalPetryka commented Oct 19, 2024

jkotas left a comment

jkotas Oct 22, 2024

jkotas commented Oct 23, 2024

jkotas commented Oct 24, 2024 •

edited

Loading

MichalPetryka commented Oct 24, 2024 •

edited

Loading

MichalPetryka commented Oct 24, 2024

jkotas commented Oct 24, 2024

MichalPetryka commented Oct 24, 2024

MichalPetryka commented Oct 24, 2024 •

edited

Loading

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 29, 2024

MichalPetryka commented Oct 29, 2024

Replace Delegate MethodInfo cache with the MethodDesc #99200

Are you sure you want to change the base?

Replace Delegate MethodInfo cache with the MethodDesc #99200

Conversation

MichalPetryka commented Mar 3, 2024 • edited Loading

MichalPetryka commented Mar 4, 2024 • edited Loading

AndyAyersMS commented Mar 5, 2024

azure-pipelines bot commented Mar 5, 2024

jkotas commented Mar 5, 2024

jkotas commented Mar 5, 2024

MichalPetryka commented Mar 5, 2024 • edited Loading

jkotas commented Mar 5, 2024

MichalPetryka commented Oct 18, 2024 • edited Loading

MichalPetryka commented Oct 19, 2024

MichalPetryka commented Oct 19, 2024

jkotas left a comment

Choose a reason for hiding this comment

jkotas Oct 22, 2024

Choose a reason for hiding this comment

jkotas commented Oct 23, 2024

jkotas commented Oct 24, 2024 • edited Loading

MichalPetryka commented Oct 24, 2024 • edited Loading

MichalPetryka commented Oct 24, 2024

jkotas commented Oct 24, 2024

MichalPetryka commented Oct 24, 2024

MichalPetryka commented Oct 24, 2024 • edited Loading

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 28, 2024

MichalPetryka commented Oct 28, 2024

jkotas commented Oct 29, 2024

MichalPetryka commented Oct 29, 2024

MichalPetryka commented Mar 3, 2024 •

edited

Loading

MichalPetryka commented Mar 4, 2024 •

edited

Loading

MichalPetryka commented Mar 5, 2024 •

edited

Loading

MichalPetryka commented Oct 18, 2024 •

edited

Loading

jkotas commented Oct 24, 2024 •

edited

Loading

MichalPetryka commented Oct 24, 2024 •

edited

Loading

MichalPetryka commented Oct 24, 2024 •

edited

Loading