Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: ExecutionContext.Run<TContext> overloads #30867

Closed
davidfowl opened this issue Sep 16, 2019 · 43 comments
Closed

API Proposal: ExecutionContext.Run<TContext> overloads #30867

davidfowl opened this issue Sep 16, 2019 · 43 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Milestone

Comments

@davidfowl
Copy link
Member

Today to run code on a specific ExecutionContext, there's a non-generic Run method that takes a ContextCallback and object state. We should add a generic overload to boxing can be avoided (for e.g. when passing a ValueTuple):

namespace System.Threading
{
    public sealed class ExecutionContext
    {
        // New
        public static void Run<TState>(ExecutionContext context, ContextCallback<TState> callback, ref TState state);

        // Existing
        public static void Run(ExecutionContext executionContext, ContextCallback callback, object? state);
        ...
    }

    // New (it exists but as internal)
    public delegate void ContextCallback<TState>(ref TState state);

    // Existing
    public delegate void ContextCallback(object? state);
}
@stephentoub
Copy link
Member

stephentoub commented Sep 16, 2019

We effectively already have this internally (I added it to support ManualResetValueTaskSourceCore). But it also highlights that the state should be passed as ref, not in, to support mutating such state without allocation.

@stephentoub
Copy link
Member

Also, I think the generic parameter should be TState rather than TContext, since it could otherwise imply it's somehow the type of the ExecutionContext.

And we should consider whether we actually want a new generic ContextCallback delegate, or instead add a more general purpose Action (this is just about the delegate name and namespace).

@davidfowl
Copy link
Member Author

davidfowl commented Sep 16, 2019

Made those changes. I'd also like a Func<TState, TResult> overload so I can do this:

protected override Task ProcessRequestAsync<TContext>(IHttpApplication<TContext> application, TContext context)
{
    var s = (application, context);
    return ExecutionContext.Run(_executionContext, state => 
    {
        return state.Appliation.ProcessRequestAsync(state.Context);
    }, 
    ref s);
}

@davidfowl davidfowl changed the title API Proposal: ExecutionContext.Run<TContext> API Proposal: ExecutionContext.Run<TContext> overloads Sep 16, 2019
@stephentoub
Copy link
Member

stephentoub commented Sep 16, 2019

I'd prefer to keep the number of additional overloads here smaller, to avoid the bloat that comes from introducing additional generic methods and the necessary plumbing internally (ExecutionContext.Run is not a small method), etc, and just add the single void-returning overload. This is an advanced class that very little code needs to use directly, and it's ok to have more advanced use cases be a little more cumbersome in exchange for reduced bloat. Your desired ProcessRequestAsync could be done almost as easily with the void-returning overload:

protected override Task ProcessRequestAsync<TContext>(IHttpApplication<TContext> application, TContext context)
{
    var s = (application, context, (Task)null);
    ExecutionContext.Run(_executionContext, (ref state) => // the ref state would actually need the declaring tuple type here, but that's the case in your sample code as well ;)
    {
        state.Item3 = state.application.ProcessRequestAsync(state.context);
    }, ref s);
    return s.Item3;
}

@davidfowl
Copy link
Member Author

What's the harm in adding another overload? Is it just because it's an advanced API?

This is an advanced class that very little code needs to use directly, and it's ok to have more advanced use cases be a little more cumbersome in exchange for reduced bloat. Your desired ProcessRequestAsync could be done almost as easily with the void-returning overload:

After discovering quirks with async locals last night, more code needs to call this than I originally thought does need to call it (or use an async state machine).

Also,I think we need ContextCallback to allow passing the state by ref

@stephentoub
Copy link
Member

stephentoub commented Sep 16, 2019

What's the harm in adding another overload? Is it just because it's an advanced API?

A gigantic and complicated method in Corelib duplicated again (https://github.com/dotnet/coreclr/blob/09b000f7e734e2744892f482242fe6bb66c60d59/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs#L204-L271), an additional public delegate type, additional public API surface area for niche functionality that can be accomplished without that surface area, etc.

@stephentoub
Copy link
Member

stephentoub commented Sep 16, 2019

Also,I think we need ContextCallback to allow passing the state by ref

That's why I wrote "add a more general purpose Action (this is just about the delegate name and namespace)", i.e.

namespace System
{
    public delegate void Action<T>(ref T arg);
}

@davidfowl
Copy link
Member Author

A gigantic and complicated method in Corelib duplicated again (https://github.com/dotnet/coreclr/blob/09b000f7e734e2744892f482242fe6bb66c60d59/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs#L204-L271), an additional public delegate type, additional public API surface area for niche functionality that can be accomplished without that surface area, etc.

Fair enough, it just goes to show that APIs that take callbacks usually need to account for all the ways code can be called (that includes return values).

That's why I wrote "add a more general purpose Action (this is just about the delegate name and namespace)", i.e.

ActionRef? Does it need to be general purpose?

@stephentoub
Copy link
Member

ActionRef? Does it need to be general purpose?

Not sure. Just something to consider.

@benaadams
Copy link
Member

Often I want to run with the current context; but prevent spilling (which I think @davidfowl is also trying to do).

At the moment you need to do

ExecutionContext.Run(ExecutionContext.Capture(), ...

Would be good if you could avoid the capture and the EC Run could just use the current one; rather than converting it from its internal state; then passing it back in, when it converts it back to its internal state.

@benaadams
Copy link
Member

public sealed class ExecutionContext
{
    public static void Run(ContextCallback callback, object? state);
    public static void Run<TState>(ContextCallback<TState> callback, ref TState state);
    ...
}

Which could then be utilised internally; if desired:

internal static class AsyncMethodBuilderCore
{
    public static void Start<TStateMachine>(ContextCallback<TStateMachine> callback, ref TStateMachine stateMachine) where TStateMachine : IAsyncStateMachine
    {
        if (stateMachine == null) // TStateMachines are generally non-nullable value types, so this check will be elided
        {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.stateMachine);
        }
        
        ExecutionContext.Run(callback, ref stateMachine);
    }
} 

@stephentoub
Copy link
Member

stephentoub commented Sep 16, 2019

Would be good if you could avoid the capture and the EC Run could just use the current one; rather than converting it from its internal state; then passing it back in, when it converts it back to its internal state.

What overhead are you talking about that could be avoided, and you have measurements to show that it actually matters when using in any of these consuming scenarios?

@benaadams
Copy link
Member

What overhead are you talking about that could be avoided,

null getting changed into the default context for capture; then back to null internal to Run

And because of flow suppression you need to check if the EC is null before using Run... but

Flow suppressed gets changed into null for capture, then... well... then, you are kinda stuffed as Run will throw, so you need to have a stashed EC to run on.

So it gets complicated; because you can't set EC directly (which is fine); but an api to say run this on the current context, then throw away its changes when you return would be good.

Much like AsyncMethodBuilderCore.Start<TStateMachine> is doing.

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jan 14, 2020

API review - approved.

Please match parameter names for the new overloads to parameter names for the existing overloads.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@dmitriyse
Copy link

Please do not forget to include the proposed API in some version of the NetStandard, to avoid this kind of problem: dotnet/standard#1700

@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@stephentoub stephentoub modified the milestones: 5.0, Future Feb 25, 2020
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Feb 25, 2020
@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 5, 2020

@stephentoub Re:

// the ref state would actually need the declaring tuple type here

There is this proposal aiming to relax that restriction: dotnet/csharplang#338

@danmoseley
Copy link
Member

@stephentoub should this be marked up for grabs

@stephentoub
Copy link
Member

stephentoub commented Jun 11, 2020

@davidfowl, would this be used in ASP.NET anywhere? I see only a single place in dotnet/runtime where this would be used to avoid an allocation, and it's on a path that will practically never be used; there are a few call sites it could be used to avoid a cast, but it's not clear that would actually help perf and it would have only a minor readability benefit (for code paths that are already using the advanced ExecutionContext.Run). I similarly see only a single call site to ExecutionContext.Run in all of ASP.NET, and that call site again already has a reference type argument so it would only help to avoid a cast, at the expense of additional generics.

@benaadams
Copy link
Member

Can't remember the exact use case; but it was to pass multi-state via a valuetuple?

@stephentoub
Copy link
Member

stephentoub commented Jun 11, 2020

Can't remember the exact use case; but it was to pass multi-state via a valuetuple?

Yes... but my question is, where are we going to do that? The only place I see is on a path in Socket that should almost never be used (and if it is used, perf is unlikely to be a concern). Are there others (it's certainly possible I've overlooked some)? Obviously there APIs that will get use outside of dotnet/runtime and dotnet/aspnetcore, but ExecutionContext.Run is plumbing/infrastructure code, and if we don't have good use for this overload, it's unlikely to be more useful higher in the stack.

@dmitriyse
Copy link

I was required to have a custom version of the ManualResetValueTaskSourceCore and missed the generic version of the ExecutionContext.Run to make it performance-optimal.
Please allow low-level performance-optimized code to be outside of the dotnet runtime. Exactly this method it's not an implementation detail (non-generic version already public and used by many libraries).
If you are working with the async stack, especially with IValueTaskSource you will know the .net deep enough to require performance-optimized versions of methods.

@stephentoub
Copy link
Member

stephentoub commented Jun 12, 2020

I was required to have a custom version of the ManualResetValueTaskSourceCore and missed the generic version of the ExecutionContext.Run to make it performance-optimal.

Separate from this issue, can you elaborate on why you needed a custom version of that?

non-generic version already public and used by many libraries

And successfully. What benefit does this overload bring to the table for those uses? My point of referring to all the code in dotnet/runtime is that the vast, vast majority of them are able to use the existing overload successfully, including all but one of the uses outside of corelib (and that one as mentioned will basically never be used).

I get that there are extreme cases where this overload is valuable (I'm the one who added the overload internally in the first place); ManualResetValueTaskSourceCore is one... but it's literally the only one in all of the .NET implementation.

@davidfowl
Copy link
Member Author

We fixed the issue in a different way so our immediate need is no longer immediate. You can put this in future

@davidfowl
Copy link
Member Author

Yes... but my question is, where are we going to do that? The only place I see is on a path in Socket that should almost never be used (and if it is used, perf is unlikely to be a concern). Are there others (it's certainly possible I've overlooked some)? Obviously there APIs that will get use outside of dotnet/runtime and dotnet/aspnetcore, but ExecutionContext.Run is plumbing/infrastructure code, and if we don't have good use for this overload, it's unlikely to be more useful higher in the stack.

It was going to be used in Kestrel to avoid allocating per request. We were in a state machine and had multiple things to pass (this and the httpcontext) to ExecutionContext.Run so that we ended up with a clean execution context per request.

@benaadams
Copy link
Member

It currently now allocates for any request the suspends but via ValueTask so the allocation can currently be switched off with DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS

image

@stephentoub
Copy link
Member

Ben, can you elaborate? You would use this method there to avoid an allocation? How? If that's actually the case, it's probably worth exposing.

@benaadams
Copy link
Member

It was arranged differently before; however because it wasn't using await for ProcessRequest the ExecutionContext would "leak" between requests (depending where it was set and if no-awaits were used)

The difficulty with using the regular EC.Run to solve this is application is a method parameter, and also thisis needed, so a double reference state needs to be passed into EC.Run; however the only overload available takes a single object reference, so that would need an allocation to an object that could take both parameters to use the current method.

The T overload would mean the double this, application state could be passed in a ValueTuple so not require an allocation there either.

For this particular example await + ValueTask + pooling valuetasks is a simpler cleaner solution; code-wise, however it was a use case.

The main issue was passing method state (locals/params) into the EC.Run; if it was just class state then you could just pass the class.

@stephentoub
Copy link
Member

The T overload would mean the double this, application state could be passed in a ValueTuple so not require an allocation there either.

Of course, I just don't see how that's applicable here. If this overload was exposed, how would this code change to avoid allocations? We'd be able to pass in both values without an allocation, but the method isn't asynchronous nor awaitable.

@benaadams
Copy link
Member

benaadams commented Jun 13, 2020

Just looked again; previously the ProcessRequest method didn't exist and it was all inlined into ProcessRequests<TContext> so it was all in the same statemachine loop (and didn't allocate extra); however the corner-case was where an AyncLocal<T> was modified but none of the user methods where async (e.g. all Task.CompletedTask returning sync methods); then the AyncLocal<T> local changes would leak into the loop and pass between requests (e.g. dotnet/aspnetcore#15384 and dotnet/aspnetcore#13991)

The two approaches were:

  1. Guard the calls to user methods with EC.Run, however an overload didn't exist (it would both need to pass in multi-state (this, application) and have a return value to get the Task to await back; hence ref)

  2. Move all calls of user-code to a seperate async method and then await that to undo any EC changes; ValueTask removes the allocation for sync completing, but introduces a new one for suspended methods.

Went with (2) as the overload didn't exist and it looked like ValueTask pooling was happening...

@stephentoub
Copy link
Member

Thanks. So will it change to (1) if this is added?

@benaadams
Copy link
Member

Thanks. So will it change to (1) if this is added?

Would make sense; or you could enable DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS by default then could just use async ValueTask methods to handle it without the additional allocation if they suspend 😉

The choice is essentially between:

  1. Pushing every await into a top level loop to avoid suspend allocations (which it was doing before); which then it needs this EC.Run overload if it runs user code, but that's also a little contorted, so perhaps not the best pattern for people to copy.

  2. Moving user-code calls out to async ValueTask method; so async naturally handles the EC restore; which is cleaner, but then it adds allocations if one of the user code methods suspends.

(2) is much easier to write, a better pattern and easier to understand; but (1) saves on the allocations as it currently stands with no pooling

@stephentoub
Copy link
Member

stephentoub commented Jun 13, 2020

or you could enable DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS by default then could just use async ValueTask methods to handle it without the additional allocation if they suspend

I've yet to see sure-fire evidence that doing so actually moves the needle. Did I miss it? On top of that, ASP.NET itself doesn't appear to be safe for that: dotnet/aspnetcore#16876. And even if it was enabled, the cache is finite in size, so it would very likely still allocate.

@benaadams
Copy link
Member

benaadams commented Jun 13, 2020

On top of that, ASP.NET itself doesn't appear to be safe for that: dotnet/aspnetcore#16876

Someone should probably fix those 😅

And even if it was enabled, the cache is finite in size, so it would very likely still allocate.

I was quite surprised in my load testing (for our app rather than TE testing) that the cache was sufficient. Likely because all the suspensions were very short lived. If there are longer lived suspensions then the allocations would likely become noise against the suspension time.

We essentially moved to almost 0 allocations under heavy load; except for that one pesky allocation in websockets...

@benaadams
Copy link
Member

For a different spin on the api; could do a run on default context one; (which would also work here) which might have a wider value as its hard to get back to default/clean context rather than SuppressFlow

namespace System.Threading
{
    public sealed class ExecutionContext
    {
        // New
        public static void RunOnDefault<TState>(ExecutionContext context, ContextCallback<TState> callback, ref TState state);
    }

    // New (it exists but as internal)
    public delegate void ContextCallback<TState>(ref TState state);
}

/cc @davidfowl

@stephentoub
Copy link
Member

For a different spin on the api; could do a run on default context one

I think I'd rather just expose an ExecutionContext.Default if that proved important; yes you can save a few cycles by building it in, but this would seem to compose better. Regardless, it's a separate API / concept / use-case, e.g. it wouldn't work for ManualResetValueTaskSourceCore. Please feel free to open a separate issue on it with supporting use cases / places where it would be used.

I was quite surprised in my load testing (for our app rather than TE testing) that the cache was sufficient.

Does that imply you're using the switch enabled in production? Can you share on the relevant issue (this one's getting off topic ;-) the benefits you're seeing that couldn't be achieved by e.g. tweaking GC configuration settings? I would love real-world examples where the switch is providing meaningful improvements that aren't otherwise achievable; I've just not been able to get those yet.

you could enable DOTNET_SYSTEM_THREADING_POOLASYNCVALUETASKS by default

I don't see that happening for .NET 5. Too much code would likely break, both in .NET itself and out, and in subtle ways that only manifest in production. I'm relatively confident that dotnet/runtime is in good shape for it, in part because we're employing an analyzer to help ensure good ValueTask use, but I suspect many other codebases have some problematic usage, dotnet/aspnetcore included.

Given that, if this API is exposed, will ASP.NET use it?

@benaadams
Copy link
Member

Given that, if this API is exposed, will ASP.NET use it?

K thinking this through, it's a bit complicated as there are 3+ async calls that need guarding, and would ideally only capture as a singleton and only use Run once so the logic becomes complicated if most of them don't run sync, and probably inefficient.

The simplest api to use would be to flatten it out of a call back and just use a .Restore(... type api that handled the EC notifications and swapping the contexts (would also be smaller in runtime)

So you could do something like

var ec = ExecutionContext.Capture();
while (!cancelled)
{

    await OnStartAsync();

    await ProcessRequestAsync();

    await OnCompleteAsync();

    ExecutionContext.Restore(ec);
}

Would that be acceptable?

@benaadams
Copy link
Member

Being explicit the api would be:

public sealed class ExecutionContext
{
    public static void Restore(ExecutionContext context);
}

Then could push all the awaits back up to the single statemachine that's already looping and already allocated; and not have to jump through hoops trying to manage the Task completions and end up on a clean context for the next loop.

@davidfowl
Copy link
Member Author

That would indeed be much cleaner than a callback API.

@benaadams
Copy link
Member

@stephentoub do you want a new issue for #30867 (comment) or are you happy revisiting it with the approval state of this issue?

@stephentoub
Copy link
Member

stephentoub commented Jun 15, 2020

New issue please :-) And it sounds like this one shouldn't be implemented.

As part of that, I'd also like to understand if you believe that API could be used by ManualResetValueTaskSourceCore without a perf penalty, such that we'd then delete the internal variant of this. And whether Restore could be used internally to delete any other duplicative code, again without harming perf.

@benaadams
Copy link
Member

Added a change for ManualResetValueTaskSourceCore #37942

@benaadams
Copy link
Member

Added api request #38011

@stephentoub
Copy link
Member

Closing as addressed by ExecutionContext.Restore

@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Threading
Projects
None yet
Development

No branches or pull requests

9 participants