-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Delegate MethodInfo cache with the MethodDesc #99200
base: main
Are you sure you want to change the base?
Conversation
I'm not sure what's up with the failures here, tests that are failing on the CI seem to pass on my machine. |
src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions. |
The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it. This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize |
I think the thing that'd need benchmarking here are equality checks and maybe the impact of collectible delegates being stored in the CWT on the GC, the rest of things shouldn't be performance sensitive enough to matter I think? I'm not fully sure what'd be the proper way for benchmarking the latter.
I am going to benchmark the impact of the equality change tomorrow, I feel like if the impact won't be big, potential optimizations to it can be done later. |
Write a small program that loads an assembly as collectible and calls a method in it. The method in collectible assembly can create delegates in a loop. (If you would like to do it under benchmarknet, it works too - but it is probably more work.) |
I've benchmarked the APIs, had to massage the code a bit to make the JIT happy though:
GetHashCode:
Method:
|
I'm not sure if the System.Text.Json failures here are related, I saw it here before with Windows x86 Debug and now it's on OSX x64 Debug, couldn't find an issue for them either. |
For context, my only motivation with this is unblocking #85014 cause I already got non PGO delegate inlining mostly working locally, but it currently only handles static readonly delegates and requires that to work for lambdas and #108606 and #108579 to work better with delegates in general. (I also don't have the NativeAOT VM implementation for my new JIT-EE API finished yet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The PR title is not correct - this change is not making the delegates immutable anymore.
src/coreclr/System.Private.CoreLib/src/System/Delegate.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/vm/comdelegate.cpp
Outdated
@@ -1725,8 +1729,10 @@ extern "C" void QCALLTYPE Delegate_Construct(QCall::ObjectHandleOnStack _this, Q | |||
if (COMDelegate::NeedsWrapperDelegate(pMeth)) | |||
refThis = COMDelegate::CreateWrapperDelegate(refThis, pMeth); | |||
|
|||
refThis->SetMethodDesc(pMethOrig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave it out from this PR. It is hard to see whether these places use the right MethodDesc in all corner cases.
Also, it is very incomplete, so it won't enable the JIT to do anything interesting with the MethodDesc either.
@EgorBot -intel using System;
using System.Diagnostics;
using System.Reflection.Emit;
using BenchmarkDotNet.Attributes;
public class Bench
{
[Benchmark]
public void DummyDynamicMethod()
{
DynamicMethod dm = new DynamicMethod("Dummy", typeof(void), null);
ILGenerator il = dm.GetILGenerator();
il.Emit(OpCodes.Ret);
Action a = dm.CreateDelegate<Action>();
a();
}
} |
I am hesitant to merge this after looking into the perf impact some more. This change makes reflection and dynamic methods measurably more expensive that is going to show up. EgorBot job that I have just submitted is an example of a regression. I would like to have something to show for these regressions. We can look at this as part of a larger change that has improvements too. |
I assume the regression with DynamicMethods comes from As for the reflection part, I'd assume fetching |
Another thing to note here, as far as I can see, NativeAOT doesn't seem to cache the |
I have seen large part of the regressions coming from indirect effects of large CWT. It makes the GC to do more work, makes GC pause times worse.
Yes, native AOT reflection implementation has number of perf issues. |
@EgorBot -intel using System;
using System.Diagnostics;
using System.Reflection.Emit;
using BenchmarkDotNet.Attributes;
public class Bench
{
[Benchmark]
public void DummyDynamicMethod()
{
DynamicMethod dm = new DynamicMethod("Dummy", typeof(void), null);
ILGenerator il = dm.GetILGenerator();
il.Emit(OpCodes.Ret);
Action a = dm.CreateDelegate<Action>();
a();
}
} |
Seems the overhead is gone after moving the DynamicMethod to |
Since the DynamicMethod overhead is gone now, I think only the reflection overhead is left. I personally can't think of any way that lets us avoid that while still making delegates possible to place on FOH (other than putting the MethodInfos on FOH but that seems even harder to do), do you maybe have any ideas for solving that that don't require rewriting delegates as a whole? |
I understand why change like this is necessary for placing delegates on FOH, but I do not think it helps with inlineability. |
Ah sorry, I've meant that replacing the current field caching scheme used by Roslyn with #85014 would help with that. So this indirectly helps by unlocking it cause we need to put them on FOH for that. |
I do not think it is strictly required to put the delegates to FOH. For throughput, the difference is one indirection when loading the delegate instance that is not going to make a significant difference. In any case, the scheme from #85014 would need a path that does not put the delegates on FOH to handle collectible assemblies. |
Couldn't we alloc a separate FOH segment for each unloadable context? |
I do not see how it would work. Delegates for lambdas in collectible assemblies have to be tracked by the GC. They must keep the collectible assembly alive. |
Ah right, forgot about that. As far as I've seen, strings currently deal with collectible assemblies by allocating a pinned handle for every string. Would we want to do something similar or where would they be stored on the normal heap for them? |
Nit: It is a pinned heap handle. It is not a regular pinned handle.
Probably. |
@jkotas I was thinking, could we get rid of |
Pointers can be as small as 4k on the supported platforms. I agree that there may be ways to make delegate smaller on CoreCLR, but I do not think that this specific trick would work well. |
Ah right, unmanaged delegates use the Aux field already, that makes stuff problematic. |
First attempt at making delegate GC fields immutable in CoreCLR so that they can be allocated on the NonGC heap.
I've checked it with a simple app and corerun locally with a delegate from an unloadable ALC and it seemed to not crash, assert nor unload the ALC from under the delegate, however I couldn't actually find any runtime tests that would verify delegates from unloadable ALCs work so the CI coverage might be missing.
One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.
Contributes to #85014.
cc @jkotas