Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Activity shouldn't capture AsyncLocals into its Timer #26071

Merged
merged 3 commits into from
Jan 31, 2018

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Dec 27, 2017

Causing those AsyncLocal values to live forever

For ASP.NET Core can capture logging scope, HttpContext, ConcurrentBag items, Authentication (example #25477 (comment)); other state etc

Resolves https://github.com/dotnet/corefx/issues/26069

@benaadams
Copy link
Member Author

NETFX CAS :(

17:38:51   Assembly 'System.Diagnostics.DiagnosticSource, Version=4.0.3.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51' is marked with the AllowPartiallyTrustedCallersAttribute, and uses the level 2 security transparency model.  Level 2 transparency causes all methods in AllowPartiallyTrustedCallers assemblies to become security transparent by default, which may be the cause of this exception.
17:38:51         Stack Trace:
17:38:51              at System.Diagnostics.Activity.GetUtcNow()
17:38:51              at System.Diagnostics.Activity.Start()
17:38:51              at System.Diagnostics.Tests.ActivityTests.ActivityIdOverflow()
17:38:51           ----- Inner Stack Trace -----
17:38:51              at System.Diagnostics.Activity.InitalizeSyncTimer()
17:38:51              at System.Diagnostics.Activity..cctor()
17:38:51         System.TypeInitializationException : The type initializer for 'System.Diagnostics.Activity' threw an exception.
17:38:51         ---- System.MethodAccessException : 
Attempt by security transparent method 'System.Diagnostics.Activity.InitalizeSyncTimer()' 
to access security critical method 'System.Threading.ExecutionContext.SuppressFlow()' failed.
17:38:51   

private static Timer InitalizeSyncTimer()
{
// Don't capture the current ExecutionContext and its AsyncLocals onto the timer causing them to live forever
ExecutionContext.SuppressFlow();
Copy link
Member

@jkotas jkotas Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is before-field-init static constructor that can be called pretty much any time. I am wondering whether this needs to be hardened against the case of flow suppressed already. Something like:

bool restoreFlow = false;
try
{
    if (!ExecutionContext.IsFlowSuppressed())
    {
        ExecutionContext.SuppressFlow();
        restoreFlow = true;
    }

    ...
}
finally
{
    if (restoreFlow)
        ExecutionContext.RestoreFlow();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; should probably do this with the others also #26065, #26066, #26068, #26073, #26075, #26077 as they are lazy first use triggered timers so context could also be already Supressed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@karelz
Copy link
Member

karelz commented Jan 18, 2018

@brianrob @vancem can you please take a look?

@benaadams
Copy link
Member Author

@dotnet-bot test OSX x64 Debug Build

@vancem
Copy link
Contributor

vancem commented Jan 18, 2018

We seem to have an ugly/subtle tradeoff here and I would like to understand it more (there are 7 other PRs that presumably are just like this one).

So presumably the issue is that by default (and design) Timer propagates the execution context of the thread that creates it to the thread that executes the callback. This makes sense as this is the 'point' of an execution context (it 'flows' whenever one piece of code causes another to execute).

The issue is that if you have recurring timers (that effectively live forever), this means that whatever is in the execution context at the time the timer was created also live forever.

Ben, presumably you had something 'leak' in this way. Can you give us your examples?

My basic concern is that this is a 'pitfall', and at the very least we need guidance about the issue (we made this mistake at least 6 times right?). Worse the fix is non-trivial (~10 lines of code that most people would not have thought of).

From a leak perspective, the solution should be to make the timers NOT capture the execution context by default but that means that we lose the 'flow' of the context, which seems bad (it is the point of an execution context. There does not seem to be a good, simple solution.

Still there are some things we should do

  1. We should probably make it so that there is a easy overload for 'Timer' that does not flow the execution context, That will make the 'fix' shorter, and help with discovery.
  2. We should consider making the default for flowing the execution context be DON'T FLOW if the timer has a repeat interval (it reoccurs). The rationale here is that timers that don't flow are not a problem (They die in a finite time), and things that reoccur are more likely to not be causally related to the thing that started them.

Either way I think this deserves a bit of thought.

If the leak is typically an interesting issue, it may be acceptable to do what Ben has indicated (fix the few places in the runtime we know are an issue and then live with the leaks that will be introduced later as more timers are added). Even if we do this, I would prefer that we fix Timer so the 'correct' code is easy to write (not an ugly try-finally)

Comments?

@stephentoub
Copy link
Member

stephentoub commented Jan 18, 2018

I've yet to see this as a real problem, flowing makes sense, and generally such recurring timers are created in places where such a "leak" would be negligable or non-existent. This kind of behavior had been around since the beginning of .NET, and with desktop a lot more flows. I'd want to see a strong set of examples where this is really impactful before going out of our way to do something about it, at least for user code; I don't have a problem making tweaks in coreclr/corefx when the callback is non-user code.

@benaadams
Copy link
Member Author

The issue is that if you have recurring timers (that effectively live forever), this means that whatever is in the execution context at the time the timer was created also live forever.

The issue is an application scoped Timer (static singleton) being set up by a local scoped function.

Why this is happening is because these Timers are lazily initialized on first use (good); however they then capture the context, restoring it on each activation (not so good).

presumably you had something 'leak' in this way. Can you give us your examples?

https://github.com/dotnet/corefx/issues/25477#issuecomment-346866897 the part after

And everything else after here makes no sense, since the only place that set a value has finished, and already set the value back to null;

The issue was that the authenticated user was stored in an asynclocal; the first call to the db setup the pool timers, so they could scavenge old connections, but also captured the user in the asynclocal so it was restored to the execution context every time the timer fired (not desirable).

Its not a security issue as the methods the timer calls don't touch the asynclocals and are isolated in what they do so the asynclocals won't leak; however it will both prevent them from being GC'd and as in the issue they were subscribed to AsyncLocal.OnValueChanged the were receiving notifications every-time the timer fired saying that the asynclocals had been restored (also caused the app to crash, but that's a different issue).

Without the subscription to OnValueChanged its pretty much unobservable behavior but the first Sql call or use of Activity class in this example (on netfx) will capture logging scope, HttpContext, ConcurrentBag items, Authentication etc in ASP.NET and hold onto them forever because the timer is a rooted static.

@stephentoub
Copy link
Member

HttpContext, ConcurrentBag items

ConcurrentBag doesn't store stuff in AsyncLocal; to what are you referring? And HttpContext?

@benaadams
Copy link
Member Author

benaadams commented Jan 18, 2018

ConcurrentBag doesn't store stuff in AsyncLocal

Ah, my mistake, is threadstatic :)

And HttpContext?

If you add a dependency on IHttpContextAccessor it captures the HttpContext in an AsyncLocal at the start of the request. Its not a requirement to use or in by default; but a lot of features add it just by including them; Application Insights being one of these as well as a number of 3rd party packages. Some logging in ASP.NET adds an AsyncLocal to store the current scope aspnet/Logging#715

I try to discourage the behaviour; but never the less it proves quite popular

@benaadams
Copy link
Member Author

I think the Timer capturing asynclocal state by default is correct; the issue is setting up a singleton static timer and combining it with lazy initialization that has this gotcha

@benaadams
Copy link
Member Author

benaadams commented Jan 18, 2018

The HttpContext is basically connected to everything; including the socket, headers collections, streams etc; so keeping it alive keeps a whole web of things alive (non-GC/finalized).

Its essentially a memory leak; but also is bounded as it only happens once (per static timer).

@vancem
Copy link
Contributor

vancem commented Jan 18, 2018

I am OK with the default being 'flow execution context' and if this was the only place where nontrivial logic to suppress the flow is needed, the I am OK with things as is. But given that there several places where this seems to be needed, It is reasonable to have an overload of Timer that lets you queue something asking explicitly to suppress flow (thus the fixes here become 1 line changes).

Is that reasonable?

@stephentoub
Copy link
Member

stephentoub commented Jan 19, 2018

It is reasonable to have an overload of Timer that lets you queue something asking explicitly to suppress flow

If we really believe it'll be needed by non-platform code, that'd be ok. The one known-problematic example I've seen thus far has been platform code, though. Do we have examples where this is causing real problems higher in the stack?

an overload of Timer

If we do add something, it should be an "Unsafe"-prefixed method, e.g. Timer.UnsafeCreate, as that's the mechanism used elsewhere in the runtime to denote methods that explicitly do not flow ExecutionContext, e.g. ThreadPool.UnsafeQueueUserWorkItem (its only difference from QueueUserWorkItem is it doesn't flow EC). And on desktop presumably it would need to be annotated with the appropriate security annotations.

@benaadams
Copy link
Member Author

But given that there several places where this seems to be needed, It is reasonable to have an overload of Timer that lets you queue something asking explicitly to suppress flow (thus the fixes here become 1 line changes).

While this might be helpful; e.g. been asked to move the 4 instances in SQLClient to a common function #26065 (comment), the change for Activity specifically is for netfx so I'd imagine an API would take a while to appear?

@benaadams
Copy link
Member Author

@karelz
Copy link
Member

karelz commented Jan 27, 2018

@benaadams is the PR blocked on the API review? If yes, we should close it and wait for the API review first. It may take a time, esp. if we need it also in .NET Framework first.

@benaadams
Copy link
Member Author

is the PR blocked on the API review?

No. There is a bug, the api would improve the implementation; but that implementation is a netfx one; so I imagine the lead time for the api getting in full framework would be large.

Its better to take the change; then redo the implementation if/when an improved api appears; in my opinion.

restoreFlow = true;
}

timer = new Timer(s => { Sync(); }, null, 0, 7200000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: how about just returning it here rather than defining the local above and returning it below... one line instead of three.

@vancem
Copy link
Contributor

vancem commented Jan 29, 2018

As far as the API issue is concerned. I would be happy if we defined the new method and made it internal. That avoids the API design issue. Then we are in a position to make it public whenever we care enough (which we may never to). But even in this case, I think the code is clean (giving a name to the 'try logic + suppressing flow , undoing is a good thing), and we probably would use it more than once just in the framework.

@vancem
Copy link
Contributor

vancem commented Jan 29, 2018

I realized that Timer lives in System.Private.Corelib and this is in corefx, so my 'internal' idea really is not very useful.

Sigh. I suppose we can just go ahead with the cloned code...

@jkotas
Copy link
Member

jkotas commented Jan 29, 2018

That avoids the API design issue.

Designing the API for this should be easy. We have similar APIs on thread pool and other places. Just need to stamp one on Timer. https://github.com/dotnet/corefx/issues/26523 is the API proposal.

I suppose we can just go ahead with the cloned code.

Yes, we have to do the cloned code in this particular case because of this library needs to work downlevel. Even if we have added the API now, it won't be available on downlevel runtimes.

@stephentoub stephentoub merged commit 54d3e65 into dotnet:master Jan 31, 2018
@karelz karelz added this to the 2.1.0 milestone Feb 4, 2018
@benaadams benaadams deleted the Activity-Timer branch September 9, 2018 22:52
macrogreg pushed a commit to open-telemetry/opentelemetry-dotnet-instrumentation that referenced this pull request Sep 24, 2020
…26071)

* Activity shouldn't capture AsyncLocals into its Timer

* SecuritySafeCritical

* feddback


Commit migrated from dotnet/corefx@54d3e65
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…26071)

* Activity shouldn't capture AsyncLocals into its Timer

* SecuritySafeCritical

* feddback


Commit migrated from dotnet/corefx@54d3e65
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants