-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Activity shouldn't capture AsyncLocals into its Timer #26071
Conversation
NETFX CAS :(
|
b6b9822
to
58563d5
Compare
src/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/Activity.DateTime.netfx.cs
Show resolved
Hide resolved
private static Timer InitalizeSyncTimer() | ||
{ | ||
// Don't capture the current ExecutionContext and its AsyncLocals onto the timer causing them to live forever | ||
ExecutionContext.SuppressFlow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is before-field-init static constructor that can be called pretty much any time. I am wondering whether this needs to be hardened against the case of flow suppressed already. Something like:
bool restoreFlow = false;
try
{
if (!ExecutionContext.IsFlowSuppressed())
{
ExecutionContext.SuppressFlow();
restoreFlow = true;
}
...
}
finally
{
if (restoreFlow)
ExecutionContext.RestoreFlow();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@dotnet-bot test OSX x64 Debug Build |
We seem to have an ugly/subtle tradeoff here and I would like to understand it more (there are 7 other PRs that presumably are just like this one). So presumably the issue is that by default (and design) Timer propagates the execution context of the thread that creates it to the thread that executes the callback. This makes sense as this is the 'point' of an execution context (it 'flows' whenever one piece of code causes another to execute). The issue is that if you have recurring timers (that effectively live forever), this means that whatever is in the execution context at the time the timer was created also live forever. Ben, presumably you had something 'leak' in this way. Can you give us your examples? My basic concern is that this is a 'pitfall', and at the very least we need guidance about the issue (we made this mistake at least 6 times right?). Worse the fix is non-trivial (~10 lines of code that most people would not have thought of). From a leak perspective, the solution should be to make the timers NOT capture the execution context by default but that means that we lose the 'flow' of the context, which seems bad (it is the point of an execution context. There does not seem to be a good, simple solution. Still there are some things we should do
Either way I think this deserves a bit of thought. If the leak is typically an interesting issue, it may be acceptable to do what Ben has indicated (fix the few places in the runtime we know are an issue and then live with the leaks that will be introduced later as more timers are added). Even if we do this, I would prefer that we fix Timer so the 'correct' code is easy to write (not an ugly try-finally) Comments? |
I've yet to see this as a real problem, flowing makes sense, and generally such recurring timers are created in places where such a "leak" would be negligable or non-existent. This kind of behavior had been around since the beginning of .NET, and with desktop a lot more flows. I'd want to see a strong set of examples where this is really impactful before going out of our way to do something about it, at least for user code; I don't have a problem making tweaks in coreclr/corefx when the callback is non-user code. |
The issue is an application scoped Timer (static singleton) being set up by a local scoped function. Why this is happening is because these Timers are lazily initialized on first use (good); however they then capture the context, restoring it on each activation (not so good).
https://github.com/dotnet/corefx/issues/25477#issuecomment-346866897 the part after
The issue was that the authenticated user was stored in an asynclocal; the first call to the db setup the pool timers, so they could scavenge old connections, but also captured the user in the asynclocal so it was restored to the execution context every time the timer fired (not desirable). Its not a security issue as the methods the timer calls don't touch the asynclocals and are isolated in what they do so the asynclocals won't leak; however it will both prevent them from being GC'd and as in the issue they were subscribed to AsyncLocal.OnValueChanged the were receiving notifications every-time the timer fired saying that the asynclocals had been restored (also caused the app to crash, but that's a different issue). Without the subscription to |
ConcurrentBag doesn't store stuff in AsyncLocal; to what are you referring? And HttpContext? |
Ah, my mistake, is threadstatic :)
If you add a dependency on I try to discourage the behaviour; but never the less it proves quite popular |
I think the Timer capturing asynclocal state by default is correct; the issue is setting up a singleton static timer and combining it with lazy initialization that has this gotcha |
The HttpContext is basically connected to everything; including the socket, headers collections, streams etc; so keeping it alive keeps a whole web of things alive (non-GC/finalized). Its essentially a memory leak; but also is bounded as it only happens once (per static timer). |
I am OK with the default being 'flow execution context' and if this was the only place where nontrivial logic to suppress the flow is needed, the I am OK with things as is. But given that there several places where this seems to be needed, It is reasonable to have an overload of Timer that lets you queue something asking explicitly to suppress flow (thus the fixes here become 1 line changes). Is that reasonable? |
If we really believe it'll be needed by non-platform code, that'd be ok. The one known-problematic example I've seen thus far has been platform code, though. Do we have examples where this is causing real problems higher in the stack?
If we do add something, it should be an "Unsafe"-prefixed method, e.g. Timer.UnsafeCreate, as that's the mechanism used elsewhere in the runtime to denote methods that explicitly do not flow ExecutionContext, e.g. ThreadPool.UnsafeQueueUserWorkItem (its only difference from QueueUserWorkItem is it doesn't flow EC). And on desktop presumably it would need to be annotated with the appropriate security annotations. |
While this might be helpful; e.g. been asked to move the 4 instances in SQLClient to a common function #26065 (comment), the change for Activity specifically is for netfx so I'd imagine an API would take a while to appear? |
Added API request https://github.com/dotnet/corefx/issues/26523 |
@benaadams is the PR blocked on the API review? If yes, we should close it and wait for the API review first. It may take a time, esp. if we need it also in .NET Framework first. |
No. There is a bug, the api would improve the implementation; but that implementation is a netfx one; so I imagine the lead time for the api getting in full framework would be large. Its better to take the change; then redo the implementation if/when an improved api appears; in my opinion. |
restoreFlow = true; | ||
} | ||
|
||
timer = new Timer(s => { Sync(); }, null, 0, 7200000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: how about just returning it here rather than defining the local above and returning it below... one line instead of three.
As far as the API issue is concerned. I would be happy if we defined the new method and made it internal. That avoids the API design issue. Then we are in a position to make it public whenever we care enough (which we may never to). But even in this case, I think the code is clean (giving a name to the 'try logic + suppressing flow , undoing is a good thing), and we probably would use it more than once just in the framework. |
I realized that Timer lives in System.Private.Corelib and this is in corefx, so my 'internal' idea really is not very useful. Sigh. I suppose we can just go ahead with the cloned code... |
Designing the API for this should be easy. We have similar APIs on thread pool and other places. Just need to stamp one on Timer. https://github.com/dotnet/corefx/issues/26523 is the API proposal.
Yes, we have to do the cloned code in this particular case because of this library needs to work downlevel. Even if we have added the API now, it won't be available on downlevel runtimes. |
…26071) * Activity shouldn't capture AsyncLocals into its Timer * SecuritySafeCritical * feddback Commit migrated from dotnet/corefx@54d3e65
…26071) * Activity shouldn't capture AsyncLocals into its Timer * SecuritySafeCritical * feddback Commit migrated from dotnet/corefx@54d3e65
Causing those AsyncLocal values to live forever
For ASP.NET Core can capture logging scope, HttpContext, ConcurrentBag items, Authentication (example #25477 (comment)); other state etc
Resolves https://github.com/dotnet/corefx/issues/26069