Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

stephentoub · 2021-07-26T02:19:47Z

The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case.

This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.

Closes #50921

Method	Toolchain	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
PingPong	\main\corerun.exe	148.7 ms	2.92 ms	4.00 ms	1.00	29750.0000	3000.0000	250.0000	180,238 KB
PingPong	\pr\corerun.exe	108.9 ms	1.56 ms	1.38 ms	0.72	-	-	-	249 KB

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net.WebSockets;

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private class Connection
    {
        public readonly WebSocket Client, Server;
        public readonly Memory<byte> ClientBuffer = new byte[256];
        public readonly Memory<byte> ServerBuffer = new byte[256];
        public readonly CancellationToken CancellationToken = default;

        public Connection()
        {
            (Stream Stream1, Stream Stream2) streams = ConnectedStreams.CreateBidirectional();
            Client = WebSocket.CreateFromStream(streams.Stream1, isServer: false, subProtocol: null, Timeout.InfiniteTimeSpan);
            Server = WebSocket.CreateFromStream(streams.Stream2, isServer: true, subProtocol: null, Timeout.InfiniteTimeSpan);
        }
    }

    private Connection[] _connections = Enumerable.Range(0, 256).Select(_ => new Connection()).ToArray();
    private const int Iters = 1_000;

    [Benchmark]
    public Task PingPong() =>
        Task.WhenAll(from c in _connections select Task.WhenAll(
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Server.ReceiveAsync(c.ServerBuffer, c.CancellationToken);
                                 await c.Server.SendAsync(c.ServerBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                             }
                         }),
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Client.SendAsync(c.ClientBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                                 await c.Client.ReceiveAsync(c.ClientBuffer, c.CancellationToken);
                             }
                         })));
}

ghost · 2021-07-26T02:19:54Z

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case.

This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.

Closes #50921

Method	Toolchain	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
PingPong	\main\corerun.exe	148.7 ms	2.92 ms	4.00 ms	1.00	29750.0000	3000.0000	250.0000	180,238 KB
PingPong	\pr\corerun.exe	108.9 ms	1.56 ms	1.38 ms	0.72	-	-	-	249 KB

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net.WebSockets;

[MemoryDiagnoser]
public class Program
{
    public static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private class Connection
    {
        public readonly WebSocket Client, Server;
        public readonly Memory<byte> ClientBuffer = new byte[256];
        public readonly Memory<byte> ServerBuffer = new byte[256];
        public readonly CancellationToken CancellationToken = default;

        public Connection()
        {
            (Stream Stream1, Stream Stream2) streams = ConnectedStreams.CreateBidirectional();
            Client = WebSocket.CreateFromStream(streams.Stream1, isServer: false, subProtocol: null, Timeout.InfiniteTimeSpan);
            Server = WebSocket.CreateFromStream(streams.Stream2, isServer: true, subProtocol: null, Timeout.InfiniteTimeSpan);
        }
    }

    private Connection[] _connections = Enumerable.Range(0, 256).Select(_ => new Connection()).ToArray();
    private const int Iters = 1_000;

    [Benchmark]
    public Task PingPong() =>
        Task.WhenAll(from c in _connections select Task.WhenAll(
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Server.ReceiveAsync(c.ServerBuffer, c.CancellationToken);
                                 await c.Server.SendAsync(c.ServerBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                             }
                         }),
                         Task.Run(async () =>
                         {
                             for (int i = 0; i < Iters; i++)
                             {
                                 await c.Client.SendAsync(c.ClientBuffer, WebSocketMessageType.Binary, endOfMessage: true, c.CancellationToken);
                                 await c.Client.ReceiveAsync(c.ClientBuffer, c.CancellationToken);
                             }
                         })));
}

Author:	stephentoub
Assignees:	-
Labels:	`area-System.Net`, `tenet-performance`
Milestone:	6.0.0

stephentoub · 2021-07-26T13:33:10Z

@davidfowl, can you help validate this with ASP.NET functional tests and against relevant ASP.NET perf tests? That needs to be done before this can be merged.

src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/AsyncMutex.cs

karelz

@CarnaViire if you get a chance to take a look as well, that would be good.

stephentoub · 2021-08-05T16:50:23Z

@davidfowl, @adityamandaleeka, any update on validating this change for ASP.NET?

CarnaViire

LGTM

…veAsync The ManagedWebSocket implementation today supports CloseAsyncs being issued concurrently with ReceiveAsyncs, even though CloseAsync needs to issue receives (this allowance was carried over from the .NET Framework implementation). Currently the implementation does that by storing the last ReceiveAsync task and awaiting it in CloseAsync if there is one, but that means multiple parties may try to await the same task multiple times (the original caller of ReceiveAsync and CloseAsync), which means we can't just use a ValueTask. So today asynchronously completing ReceiveAsyncs always use AsTask to create a Task from the returned ValueTask. This isn't actually an additional task allocation today, as the async ValueTask builder will create a Task for the asynchronously completing operation, and then AsTask will just return that (and when it completes synchronously, there's extra code to substitute a singleton). But once we switch to using the new pooling builder, that's no longer the case. This PR uses an async lock as part of the ReceiveAsync implementation, with the existing async method awaiting entering that lock. CloseAsync is then rewritten to be in terms of calling ReceiveAsync in a loop. This also lets us remove the existing Monitor used for synchronously coordinating state between these operations, as the async lock serves that purpose as well. Rather than using a SemaphoreSlim, since we expect zero contention in the common case, we use a simple AsyncMutex that's optimized for the zero contention case, using a single interlocked to acquire and a single interlocked to release the lock.

stephentoub · 2021-08-10T20:55:46Z

Once CI is green, I'll go ahead and merge this. In my own tests, this shows up as neutral to positive both locally in microbenchmarks and on asp-perf-lin and asp-citrine-lin in terms of throughput. We can revert it if it ends up having any negative impact once it makes it to dotnet/aspnetcore.

@davidfowl, @adityamandaleeka, I'd still appreciate extra validation here, but at this point if I don't merge it's not going to make the release. The new websockets benchmark doesn't seem to really stress the system with or without this.

davidfowl · 2021-08-10T21:34:09Z

Merging and propagating a dependency flow PR is the easiest way to get validation.

stephentoub added area-System.Net tenet-performance Performance related issue labels Jul 26, 2021

stephentoub added this to the 6.0.0 milestone Jul 26, 2021

karelz mentioned this pull request Jul 26, 2021

[HTTP/3] Test failure ResponseCancellation_ServerReceivesCancellation #56194

Closed

stephentoub commented Jul 26, 2021

View reviewed changes

src/libraries/System.Net.WebSockets/src/System/Net/WebSockets/AsyncMutex.cs Show resolved Hide resolved

stephentoub force-pushed the websocketreceivealloc branch from 11562c5 to d57be3d Compare July 26, 2021 17:47

karelz mentioned this pull request Jul 28, 2021

Test failure: System.Net.Quic.Tests.QuicStreamTests_MsQuicProvider.BasicTest #56269

Closed

karelz approved these changes Jul 29, 2021

View reviewed changes

stephentoub force-pushed the websocketreceivealloc branch from d57be3d to 74a34b8 Compare August 4, 2021 13:45

stephentoub mentioned this pull request Aug 9, 2021

echo benchmark for Websockets aspnet/Benchmarks#1686

Merged

CarnaViire approved these changes Aug 10, 2021

View reviewed changes

stephentoub force-pushed the websocketreceivealloc branch from 74a34b8 to 14990bc Compare August 10, 2021 20:44

Fix misleading comment

efedccd

stephentoub merged commit 19b86fb into dotnet:main Aug 11, 2021

stephentoub deleted the websocketreceivealloc branch August 11, 2021 02:14

jaredpar mentioned this pull request Aug 17, 2021

Tests failures: System.Net.NameResolution.Tests / DnsObsolete* & Dns_GetHostEntry* #1488

Open

CarnaViire mentioned this pull request Aug 26, 2021

WebSocket and multiple reads and writes #58202

Closed

ghost locked as resolved and limited conversation to collaborators Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

stephentoub commented Jul 26, 2021

ghost commented Jul 26, 2021

stephentoub commented Jul 26, 2021

karelz left a comment

stephentoub commented Aug 5, 2021

CarnaViire left a comment

stephentoub commented Aug 10, 2021

davidfowl commented Aug 10, 2021

Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

Refactor ManagedWebSocket to avoid forcing Task allocations for ReceiveAsync #56282

Conversation

stephentoub commented Jul 26, 2021

ghost commented Jul 26, 2021

stephentoub commented Jul 26, 2021

karelz left a comment

Choose a reason for hiding this comment

stephentoub commented Aug 5, 2021

CarnaViire left a comment

Choose a reason for hiding this comment

stephentoub commented Aug 10, 2021

davidfowl commented Aug 10, 2021