-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible deadlock on process exit if GC is in progress #107800
Comments
A simple repro: internal class Program
{
static void Main(string[] args)
{
Task.Run(() =>
{
for (; ; )
{
GC.Collect();
}
});
Thread.Sleep(10);
Environment.Exit(0);
}
} |
There are several ways to fix this. |
I tried running your repro on .NET 9 RC1 for 10 minutes in a loop and it did not hang. What's the stacktrace of the hang? |
The stack is
|
Right, after updating to RC1 I no longer see this reproducing. It looks like it has been fixed as a sideeffect of #103877 as it switched This is the case where |
So, I guess this is "fixed" for 9.0+ , but what should we do with 8.0? My fix was adding a condition for |
I have tried number of variants of this when implementing #103877. I do not think it would work well. It would replace intermittent shutdown hang with intermittent shutdown crash. The PR description tried to explain it: "The existing g_fProcessDetach flag is set too late - using this flag to skip cooperative mode switch would lead to shutdown deadlocks, and the existing g_fEEShutDown flag is set too early - using this flag to skip cooperative mode switch would lead to shutdown crashes." RtlDllShutdownInProgress flag is set at the right time.
Does the customer workload experiences the shutdown hangs in just a few specific spots? I think that a target servicing-grace fix would fix just those few spots. |
Cooperative shutdown seems quite a fragile bug-farmish scenario. In particular, incorrect timeliness could indeed balance between deadlock or a crash (too late/early). I was thinking - "why can't we just backport the whole thing?", but my concerns are:
There is a possibility that after plugging one hole we will see another one reported, but at least we could minimize chances of introducing new or worse issues. |
It appears for that report we always see the lock up in the same place - when (Perhaps |
This is fixed in 9.0+ . Just need to track fixing in 8.0, so I changed milestone to be 8.0. |
Fixed in #107844 |
Bug reported by internal partners. In some relatively infrequent cases a worker process may get stuck at exiting.
Such "stuck" processes could become a nuisance, especially when the memory footprint is large.
The reason for the lock up is that an exiting thread tries to get into COOP mode to fix up its stack.
This should only be done when a thread exits cleanly (i.e. thread exits). When this is a result of process termination (i.e. thread calls ExitProcess), getting into COOP mode can wait for GC state to clear, which may never happen, since we may not have GC any more.
The bug seems to be old - I tried 9.0, 8.0, 6.0 and saw it reproducing with a small directed repro.
The bug does not affect NativeAOT, only CoreCLR.
The text was updated successfully, but these errors were encountered: