-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Defer async state machine creation #10449
Comments
Re: Arch 64 bit - Cores 4 |
ValueTask by itself saves in allocations; but you don't get the full throughput in performance until you also delay the state machine creation (the actual |
@bartdesmet wrote:
|
Removed the
The issue with an analyzer rewrite for splitting the functions into pre-completed and |
@ljw1004 wrote
|
@i3arnon wrote
async Task MainAsync()
{
var task1 = FooAsync("hamster");
var task2 = FooAsync(null);
try
{
await Task.WhenAll(task1 , task2);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
async Task FooAsync(string value)
{
if (value == null) throw new ArgumentNullException();
await SendAsync(value);
}
|
@i3arnon it should work? First function's conversion would be blocked due to await in try catch, second function would become: Task FooAsync(string value)
{
if (value == null) return Task.FromException(new ArgumentNullException());
return SendAsync(value);
} |
@i3arnon wrote
|
@i3arnon not in the changed function I showed; where the exception is not thrown but |
@i3arnon wrote
|
@i3arnon yeah rewriting everything like: Task FooAsync(string value)
{
try
{
var message = value.ToString();
return SendAsync(message);
}
catch(Exception e)
{
return Task.FromException(e);
}
} probably isn't great and oddly may work better like Task FooAsync(string value)
{
try
{
return FooAsyncImpl(value);
}
catch (Exception e)
{
return Task.FromException(e);
}
}
Task FooAsyncImpl(string value)
{
var message = value.ToString();
return SendAsync(message);
} As try catch also prevents some optimizations; something to measure... |
I think you'd have to wrap everything in It would be important for semantic reasons to always have the Ended up writing a bit much there. But anyway, the point is that there's a pretty sizable minority who do throw some exceptions directly and not place them on the task, so even in the simplest possible case:
you'd still need a
because some people write their code like this:
Bottom line: I'm totally in favor of the optimization. Perhaps we could elide the |
@StephenCleary your and @i3arnon's examples cover two ends of the spectrum! Both good examples. I don't think Will do some measurements |
Exception semantics are one issue; another is ExecutionContext semantics. Async methods save the current ExecutionContext on entry, and restore it at the first I could imagine an optimizer that could eliminate the exception and/or ExecutionContext overhead under certain circumstances (where it could be proved that no exceptions are thrown and/or no modifications are made to the ExecutionContext) but I'd then wonder whether this optimization would work in enough cases to justify the complexity. |
Execution context changing would be hard to detect at compile time; especially if calling via interfaces or virtual methods. Runtime it could be detected as not default; which would allow a poor solution of doubling up the functions for default context/custom context Task MyFuncAsync()
{
if (ExecutionContext.IsDefault())
{
// Fast path
try
{
return MyFuncAsyncDefaultContext();
}
catch (Exception ex)
{
return Task.FromException(ex);
}
}
else
{
// Code as default state machine rewrite
return MyFuncAsyncCustomContext();
}
} Though now its starting to get really inelegant... :( Maybe code generators? #5561 |
Added ValueTask awaiting: ValueTask postponement of async Task<int> MethodAsync()
{
if (condition)
{
return 0;
}
return await OtherMethodAsync();
}
async Task<int> OtherMethodAsync()
{
if (otherCondition)
{
return 1;
}
return await AnotherMethodAsync();
}
async Task<int> AnotherMethodAsync()
{
...
} to // Common async awaiter
async Task<T> AwaitResult<T>(Task<T> task)
{
return await task;
}
ValueTask<int> MethodAsync()
{
if (condition)
{
return 0;
}
var task = OtherMethodAsync();
if (!task.IsCompletedSuccessfully)
{
return AwaitResult(task.AsTask());
}
return task.Result;
}
ValueTask<int> OtherMethodAsync()
{
if (otherCondition)
{
return 1;
}
return AnotherMethodAsync();
}
async Task<int> AnotherMethodAsync()
{
...
} |
I'm not sure that would help. Even if we start with a default context, if the Async method modifies that context, we still need to restore the original (default) context on the way out. To preserve all the semantics, the code needs to look more like this:
The actual infrastructure code in the Fx uses some internal tricks to make that all work more efficiently than the pseudo-implementation I gave here. The internal functionality could maybe be exposed. But still, this goop is likely the bulk of the cost you're trying to avoid. |
Though |
Good catch re |
@StephenCleary, @ericeil is any of that specified somewhere or is it an implementation detail of the current state machine creation? |
Neither. It's not documented/specified, but it can't be treated as "just an implementation detail", either, since it significantly changes the semantics of types like |
Avoid "await" if ReceiveReplyAsync completed synchronously (because the data was already in memory). See dotnet/roslyn#10449 for more details.
A follow up point on performance; while an So if its 30 method calls deep, that's now the construction of 30 state machines per call which starts to add up. |
Example code case from aspnet/KestrelHttpServer#863 public Task WriteAsync(ArraySegment<byte> data, CancellationToken cancellationToken)
{
if (!HasResponseStarted)
{
var produceStartTask = ProduceStartAndFireOnStarting();
// ProduceStartAndFireOnStarting normally returns a CompletedTask
if (produceStartTask.Status != TaskStatus.RanToCompletion)
{
// If the Task was not completed go async and await the task
// to surface any errors, cancellation or wait for the Task
// to complete before calling SocketOutput.WriteAsync
return WriteAsyncAwaited(produceStartTask, data, cancellationToken);
}
}
// Otherwise fast-path by not constructing an async statemachine,
// examining the various contexts and just return the final Write Task
if (_autoChunk)
{
if (data.Count == 0)
{
return TaskUtilities.CompletedTask;
}
return WriteChunkedAsync(data, cancellationToken);
}
else
{
return SocketOutput.WriteAsync(data, cancellationToken: cancellationToken);
}
}
private async Task WriteAsyncAwaited(Task produceStartTask, ArraySegment<byte> data, CancellationToken cancellationToken)
{
await produceStartTask;
if (_autoChunk)
{
if (data.Count == 0)
{
return;
}
await WriteChunkedAsync(data, cancellationToken);
}
else
{
await SocketOutput.WriteAsync(data, cancellationToken: cancellationToken);
}
} |
To follow up: public class Program
{
public static void Main(string[] args)
{
long limit = 10000000;
MainAsync(limit).Wait();
}
public static async Task MainAsync(long limit)
{
var sw = Stopwatch.StartNew();
GC.Collect();
sw.Restart();
await Async1(limit);
sw.Stop();
Console.WriteLine("Async1: {0:0.0000}s", sw.Elapsed.TotalSeconds);
GC.Collect();
sw.Restart();
await Async2(limit);
sw.Stop();
Console.WriteLine("Async2: {0:0.0000}s", sw.Elapsed.TotalSeconds);
GC.Collect();
sw.Restart();
await Async1(limit);
sw.Stop();
Console.WriteLine("Async1: {0:0.0000}s", sw.Elapsed.TotalSeconds);
GC.Collect();
sw.Restart();
await Async2(limit);
sw.Stop();
Console.WriteLine("Async2: {0:0.0000}s", sw.Elapsed.TotalSeconds);
}
private static async Task Async1(long count)
{
if (count == 0) return;
var tasks = new Task[10];
for (var i = 0; i < 10; i++)
{
tasks[i] = Async1(count / 10);
}
for (var i = 0; i < 10; i++)
{
await tasks[i];
}
}
private static Task Async2(long count)
{
if (count == 0) return Task.CompletedTask;
var tasks = new Task[10];
for (var i = 0; i < 10; i++)
{
tasks[i] = Async2(count / 10);
}
for (var i = 0; i < 10; i++)
{
if (tasks[i].Status != TaskStatus.RanToCompletion)
{
return Async2Awaited(tasks);
}
}
return Task.CompletedTask;
}
private async static Task Async2Awaited(Task[] tasks)
{
for (var i = 0; i < 10; i++)
{
await tasks[i];
}
}
} Outputs Async1: 5.8140s
Async2: 1.6628s
Async1: 5.8387s
Async2: 1.6715s So the non-deferred path is x3.5 slower than the deferred path - which is significant for fine grained async which is normally sync. |
This might be a better sample, since async is generally viral and "all the way down" public static void Main(string[] args)
{
long startCallDepth = 512;
long repeats = 1000000;
MainAsync(startCallDepth, repeats).Wait();
}
public static async Task MainAsync(long startCallDepth, long repeats)
{
var sw = Stopwatch.StartNew();
for (var i = 0L; i < 10; i++)
{
await Async1(startCallDepth);
}
for (var i = 0L; i < 10; i++)
{
await Async2(startCallDepth);
}
sw.Stop();
var callDepth = startCallDepth;
while (callDepth > 0)
{
GC.Collect();
sw.Restart();
for (var i = 0L; i < repeats; i++)
{
await Async1(callDepth);
}
sw.Stop();
Console.WriteLine("Async1, depth {1}: {0:0.0000}s", sw.Elapsed.TotalSeconds, callDepth);
GC.Collect();
sw.Restart();
for (var i = 0L; i < repeats; i++)
{
await Async2(callDepth);
}
sw.Stop();
Console.WriteLine("Async2, depth {1}: {0:0.0000}s", sw.Elapsed.TotalSeconds, callDepth);
callDepth /= 2;
}
}
private static async Task Async1(long count)
{
if (count == 0) return;
await Async1(count - 1);
}
private static Task Async2(long count)
{
if (count == 0) return Task.CompletedTask;
var task = Async2(count - 1);
if (task.Status != TaskStatus.RanToCompletion)
{
return Async2Awaited(task);
}
return Task.CompletedTask;
}
private async static Task Async2Awaited(Task task)
{
await task;
} Outputs
|
Added the try. catch, finally with execution context does it perf some private static Task Async3(long count)
{
if (count == 0) return Task.CompletedTask;
var ec = ExecutionContext.Capture();
try
{
var task = Async3(count - 1);
if (task.Status != TaskStatus.RanToCompletion)
{
return Async2Awaited(task);
}
}
catch (Exception e)
{
return Task.FromException(e);
}
finally
{
// Note: this doesn't exist as a public API
Restore(Thread.CurrentThread, ec);
}
return Task.CompletedTask;
}
static ExecutionContext Default = ExecutionContext.Capture();
internal static void Restore(Thread currentThread, ExecutionContext executionContext)
{
ExecutionContext previous = null ?? Default;
//ExecutionContext previous = currentThread.ExecutionContext ?? Default;
//currentThread.ExecutionContext = executionContext;
// New EC could be null if that's what ECS.Undo saved off.
// For the purposes of dealing with context change, treat this as the default EC
executionContext = executionContext ?? Default;
if (previous != executionContext)
{
//OnContextChanged(previous, executionContext);
}
} Results
|
The tail call optimization is discuessed at #1981 |
We are now taking language feature discussion in other repositories:
Features that are under active design or development, or which are "championed" by someone on the language design team, have already been moved either as issues or as checked-in design documents. For example, the proposal in this repo "Proposal: Partial interface implementation a.k.a. Traits" (issue 16139 and a few other issues that request the same thing) are now tracked by the language team at issue 52 in https://github.com/dotnet/csharplang/issues, and there is a draft spec at https://github.com/dotnet/csharplang/blob/master/proposals/default-interface-methods.md and further discussion at issue 288 in https://github.com/dotnet/csharplang/issues. Prototyping of the compiler portion of language features is still tracked here; see, for example, https://github.com/dotnet/roslyn/tree/features/DefaultInterfaceImplementation and issue 17952. In order to facilitate that transition, we have started closing language design discussions from the roslyn repo with a note briefly explaining why. When we are aware of an existing discussion for the feature already in the new repo, we are adding a link to that. But we're not adding new issues to the new repos for existing discussions in this repo that the language design team does not currently envision taking on. Our intent is to eventually close the language design issues in the Roslyn repo and encourage discussion in one of the new repos instead. Our intent is not to shut down discussion on language design - you can still continue discussion on the closed issues if you want - but rather we would like to encourage people to move discussion to where we are more likely to be paying attention (the new repo), or to abandon discussions that are no longer of interest to you. If you happen to notice that one of the closed issues has a relevant issue in the new repo, and we have not added a link to the new issue, we would appreciate you providing a link from the old to the new discussion. That way people who are still interested in the discussion can start paying attention to the new issue. Also, we'd welcome any ideas you might have on how we could better manage the transition. Comments and discussion about closing and/or moving issues should be directed to #18002. Comments and discussion about this issue can take place here or on an issue in the relevant repo. I am closing this issue because discussion appears to have died down. You are welcome to open a new issue in the csharplang repo if you would like to kick-start discussion again. |
I think this is not specific language design but a compiler or IL generator optimization which should alive on roslyn. Thus I request this thread to be reopened here This feature does not make any change in any language. Only IL generated is the thing that should be optimized and analyzed |
From #7169 (comment) will evolve this over time, however this is to fork the discussion from @ljw1004's great proposal as its a separate thing.
While you're there, messing with the async state machine.... 😉
There is currently a faster await path for completed tasks; however an async function still comes with a cost. There are faster patterns to avoid the state machine; which involve greater code complexity so it would be nice if the compiler could generate them as part of the state machine construction.
Tail call clean up
becomes
Like-wise single tail await
becomes
Mid async
splits at first non completed task into async awaiting function
ValueTask postponement of
async
andTask<T>
to
The text was updated successfully, but these errors were encountered: