-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThe ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case. This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock. Closes #50921
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net.WebSockets;
[MemoryDiagnoser]
public class Program
{
public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
private class Connection
{
public readonly WebSocket Client, Server;
public readonly Memory<byte> ClientBuffer = new byte[256];
public readonly Memory<byte> ServerBuffer = new byte[256];
public readonly CancellationToken CancellationToken = default;
public Connection()
{
(Stream Stream1, Stream Stream2) streams = ConnectedStreams.CreateBidirectional();
Client = WebSocket.CreateFromStream(streams.Stream1, isServer: false, subProtocol: null, Timeout.InfiniteTimeSpan);
Server = WebSocket.CreateFromStream(streams.Stream2, isServer: true, subProtocol: null, Timeout.InfiniteTimeSpan);
}
}
private Connection[] _connections = Enumerable.Range(0, 256).Select(_ => new Connection()).ToArray();
private const int Iters = 1_000;
[Benchmark]
public Task PingPong() =>
Task.WhenAll(from c in _connections select Task.WhenAll(
Task.Run(async () =>
{
for (int i = 0; i < Iters; i++)
{
await c.Server.ReceiveAsync(c.ServerBuffer, c.CancellationToken);
await c.Server.SendAsync(c.ServerBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
}
}),
Task.Run(async () =>
{
for (int i = 0; i < Iters; i++)
{
await c.Client.SendAsync(c.ClientBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
await c.Client.ReceiveAsync(c.ClientBuffer, c.CancellationToken);
}
})));
}
|
@davidfowl, can you help validate this with ASP.NET functional tests and against relevant ASP.NET perf tests? That needs to be done before this can be merged. |
src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/AsyncMutex.cs
Show resolved
Hide resolved
11562c5
to
d57be3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarnaViire if you get a chance to take a look as well, that would be good.
d57be3d
to
74a34b8
Compare
@davidfowl, @adityamandaleeka, any update on validating this change for ASP.NET? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…veAsync The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case. This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.
74a34b8
to
14990bc
Compare
Once CI is green, I'll go ahead and merge this. In my own tests, this shows up as neutral to positive both locally in microbenchmarks and on asp-perf-lin and asp-citrine-lin in terms of throughput. We can revert it if it ends up having any negative impact once it makes it to dotnet/aspnetcore. @davidfowl, @adityamandaleeka, I'd still appreciate extra validation here, but at this point if I don't merge it's not going to make the release. The new websockets benchmark doesn't seem to really stress the system with or without this. |
Merging and propagating a dependency flow PR is the easiest way to get validation. |
The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case.
This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.
Closes #50921