Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround a CLR deadlock caused by unloading app domains with STA RCWs #34498

Conversation

jasonmalinowski
Copy link
Member

If you have an STA thread that created Runtime Callable Wrappers, it's possible to end up in a deadlock where the finalizer and app domain unload code are waiting for each other. The suggested workaround from the CLR team is to wait for finalizers before letting the domain shutdown but while the thread is still running.

Fixes #34248

@jasonmalinowski jasonmalinowski force-pushed the work-around-appdomain-rcw-deadlock branch from 613d56f to 0d84adc Compare March 27, 2019 02:30
@jasonmalinowski jasonmalinowski self-assigned this Mar 27, 2019
@jasonmalinowski jasonmalinowski force-pushed the work-around-appdomain-rcw-deadlock branch from 0d84adc to 0330bf6 Compare March 27, 2019 02:36
@sharwell
Copy link
Member

At this rate I really hate to imagine what the third hack in this block will look like

@jasonmalinowski jasonmalinowski marked this pull request as ready for review March 27, 2019 14:30
@jasonmalinowski jasonmalinowski requested a review from a team as a code owner March 27, 2019 14:30
//
// The suggested workaround from the CLR team is to do an explicit GC.Collect and
// WaitForPendingFinalizers before we let the AppDomain shut down. The belief is subscribing
// to the Unload is a reasonable place to do it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Are we sure the RCWs will get cleaned up before the STA thread is terminated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the STA thread tries to clean up it's own RCWs on shutdown of the thread -- in the dumps I saw that lead to the internal conversation it was that process that was waiting for finalization (and finalization on the thread).

There's no guarantee from the CLR team that this will guarantee RCW cleanup, but it seemed that there's really no way to make the guarantee short of the CLR making some pretty deep changes to how this all works internally, for example by changing how AppDomain Unload works internally to correctly order operations.

// to the Unload is a reasonable place to do it.
if (Interlocked.CompareExchange(ref s_SubscribedToAppDomainUnload, 1, 0) == 0)
{
AppDomain.CurrentDomain.DomainUnload += (sender, e) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this event happen before or after Thread.Abort is called on our STA thread?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the mail thread with the CLR team, before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ... if that's the case I wonder if we should call Dispatcher.Shutdown here as well. That would seemingly solve the unload event race that's displayed below. That's a separate issue though.


In reply to: 269633055 [](ancestors = 269633055)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work. I guess there's also good arguments here that we should just be making this thread a test fixture so it's properly shut down when we're done using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into that a bit. The setup works and we have a definite time in which to do shutdown operations. At the same time though:

  1. It still requires passing static data. There is no way for the test fixture and test runner to communicate via the xunit APIs. That means mutable static is the only way.
  2. It requires a lot of source annotations. Including duplicating definitions between all our test assemblies. That's annoying but not a deal breaker.

That's why I haven't moved forward on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right forgot about 1. I think the "usual" asumption is instead we'd switch our tests to regular Facts and just have all our Wpf ones do something like _staThread.RunOnThread() and then pass the lambda of the actual test contents. Or do an await SwitchToStaThread() pattern not unlike JTF.

{
AppDomain.CurrentDomain.DomainUnload += (sender, e) =>
{
GC.Collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time I see the Collect + WaitForPendingFinalizers pattern it's done in a loop. Assume there is a reason you didn't need to do that here. Curious about what it would be.

Copy link
Member Author

@jasonmalinowski jasonmalinowski Mar 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how much to loop. So, ¯\_(ツ)_/¯?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number I'm thinking of here is three. I have no idea why I I think that though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tradition dictates three, but that's cargo cult programming if I've ever seen it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't argue with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jaredpar jaredpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jasonmalinowski jasonmalinowski force-pushed the work-around-appdomain-rcw-deadlock branch 2 times, most recently from 58f0b4e to 3a190d7 Compare March 27, 2019 16:20
If you have an STA thread that created Runtime Callable Wrappers, it's
possible to end up in a deadlock where the finalizer and app domain
unload code are waiting for each other. The suggested workaround from
the CLR team is to wait for finalizers before letting the domain
shutdown but while the thread is still running.

Fixes dotnet#34248
@jasonmalinowski jasonmalinowski force-pushed the work-around-appdomain-rcw-deadlock branch from 3a190d7 to bea2405 Compare March 27, 2019 19:44
AppDomain.CurrentDomain.DomainUnload += (sender, e) =>
{
GC.Collect();
GC.WaitForPendingFinalizers();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've sent a PR that uses a static consturctor to hook AppDomain.Unload so we can call GC.Collect and I can't argue with it because it's probably the right thing to do.

@jasonmalinowski jasonmalinowski merged commit 30a9531 into dotnet:master Mar 28, 2019
@jasonmalinowski jasonmalinowski deleted the work-around-appdomain-rcw-deadlock branch March 28, 2019 19:08
@sharwell sharwell added this to the 16.1.P2 milestone Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants