-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EventPipe caches EventSource provider callback which can cause an exception on EventPipe shutdown #36028
Comments
Tagging subscribers to this area: @tommcdon |
Fixing this will likely involve either doing something to root the EventSource in question until after shutdown occurs, or cleaning up the provider during DeleteProvider instead of marking it as deferred. A fix still needs investigated, but we should fix this for 5.0. |
Here is a repro program, if you run with
|
I'm going to look at this as the next bug on my plate |
@davmason is this still an issue we are targeting for 5.0? |
Yes, this should still go in for 5.0. It will be a relatively low risk fix and have a lot of value. We regularly run in to it in our test runs |
tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror.sh fails under gcstress (as seen in #35884)
Copying the analysis from this comment: #35884 (comment)
An EventSource will call dispose on its underlying providers when it is finalized or Dispose is called, and in turn the EventPipeProvider will call EventUnregister which calls EventPipeInternal::DeleteProvider.
Up until this point everything is ok, but inside of EventPipe::DeleteProvider there is a check for Enabled(), which checks if there are any active sessions, and if so will put the providers in a deferred delete state.
runtime/src/coreclr/src/vm/eventpipe.cpp
Lines 616 to 627 in c2b1018
Which means if any other provider is active when the EventSource tries to delete itself from EventPipe, it will get placed in the deferred list. For example, in this test Microsoft-Windows-DotNETRuntime is also active. This is fundamentally flawed for interacting with managed providers, because as soon as they are disposed/finalized the EventProvider's m_etwCallback will be collected and later when EventPipe goes to delete the deferred providers we will attempt to reverse pinvoke to a collected delegate.
I suspect this has gone undetected for so long because we don't have a ton of testing around EventSources and how they interact with EventPipe, and likely the tests that do test the scenario just happen to be lucky and the delegate doesn't get collected in the timeframe. Even this test passes fine under normal circumstances and it takes GCStress to fail.
The text was updated successfully, but these errors were encountered: