-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add write queue to Http2Connection to optimize write handling #39166
Conversation
Tagging subscribers to this area: @dotnet/ncl |
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Show resolved
Hide resolved
private async Task PerformWriteAsync<T>(int writeBytes, T state, Func<T, Memory<byte>, FlushTiming> lockedAction, CancellationToken cancellationToken = default) | ||
PerformWriteAsync(0, 0, (_, __) => true, cancellationToken); | ||
|
||
private abstract class WriteQueueEntry : TaskCompletionSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be part of this PR, but given we're now allocating this on every write, it'd be really good to look at changing the base type to be one that implements IValueTaskSource and making this reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, though it gets a little weird because we have a bunch of different write operations with different kinds of state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a tradeoff to be made here. This is similar to what pipelines does. You don't wait on individual writes but you can have backpressure based on how much outstanding buffer is unflushed/unconsumed. That removes the per operation overhead and allocations.
Do you care about cancelling individual writes or do you care about stopping the overall writing operation?
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
Pushed changes to address everything except |
I'm curious why the gRPC client gets such a large performance improvement compared to the "raw HttpClient" scenario. If anything I would have expected a smaller benefit to the gRPC client just because HttpClient+HTTP/2 makes up a smaller percentage of the work happening when it is used. Have you profiled where the improvement comes from? |
My assumption is that the "raw" mode isn't actually equivalent to full GRPC client, in terms of HTTP2 usage. That said, I haven't looked at the code for either in detail. Are you aware of any differences here?
No. |
gRPC client has a custom HttpContent and reads content from Stream. If you check out the readme at https://github.com/JamesNK/Http2Perf you can see there are 4 raw options. The closest one to the gRPC client is "r-stream-all" |
I'll try running with "r-stream-all" when my machine is back in a normal state (hopefully soon) |
Feels like this could use a edit: Just saw this #39166 (comment) |
I see similar (but slightly better) results with "r-stream-all" as compared to "g": r-stream-all g (full GRPC client) |
Ok! So this change offers bigger perf benefits to custom HttpContent implementations than to ByteArrayContent. Edit: Interesting, the benchmark still has the older copy of PushUnaryContent with multiple writes. They were merged into one write to speed up the gRPC client. Is there still performance hit from writing to the request stream multiple times? |
There could be. |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my HttpClient novice perspective 😄
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Connection.cs
Show resolved
Hide resolved
@geoffkizer would you be open to try a branch using pipelines internally (even if we don't check it in?) for a performance comparison? The only reason I ask is because the Pipe implementation is really close to the usage here. |
@davidfowl I'd be happy to give it a try, but I'm not sure how you see pipelines fitting in here. Can you explain? |
The mix of ArrayBuffer + Channel is basically what the Pipe implementation gives you. This logic introduced into Http2Connection is what Kestrel does with a pipe. Pipe is used as the queue of bytes being shuffled between the application logic (the http2stream) and the logic writing to the socket (http2connection) |
Yeah, but we are not just shuffling bytes here. The callback is responsible for a bunch of stuff other than just pushing bytes to the connection -- e.g. assigning stream ID. framing, etc. |
In the end its all bytes, you'd just shift from doing it earlier to doing it later. You enqueue the write and then later frame the buffers on the way out. This has the added benefit of potentially batching bigger writes together framed a single HTTP/2 data frame on the wire. For reference, here's Kestrel's implementation: This is where the loop is kicked off: This is the loop itself: Http2FrameWriter is the component that does the flow control and adds http framing to the bytes: The callsite writes to the pipe directly and flushes: cc @halter73 to clarify if I got anything wrong |
Are you talking about specifically for the request body (not headers etc)? Because if so, then I get it and I agree. I'm planning on investigating something like this shortly. There's some additional groundwork that needs to happen first, though. |
Yes, although in Kestrel's case everything is reversed because it is the server. The pipe that is being read from contains body bytes written by the executing The |
Yes, we do something similar for reading from the connection today; what we don't do is we don't do any buffering when we write to the connection. Instead we just pass the user's buffer along and wait for the write to be processed. Part of what I'm planning to explore is to change this so we buffer the user writes, which potentially enables a couple things: (1) not having to wait for the write queue (assuming buffer space is not exhausted); (2) consolidate user writes, including the end stream flag; (3) simplify some handling in the connection write logic. |
This gives about 33% improvement for the raw variation and 120% improvement for the full GRPC client variation on the GRPC benchmark at https://github.com/JamesNK/Http2Perf.
@stephentoub @JamesNK @dotnet/ncl