-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up contended HTTP/2 frame writing #40407
Conversation
a402eec
to
5477e28
Compare
Nice! Doing stuff in that write lock on a busy connection really is a killer 😮 How does the TLS 70x1 gRPC benchmark compare now with the non-TLS 70x1? And HTTP/3? |
@@ -83,8 +85,34 @@ public Http2Connection(HttpConnectionContext context) | |||
// Capture the ExecutionContext before dispatching HTTP/2 middleware. Will be restored by streams when processing request | |||
_context.InitialExecutionContext = ExecutionContext.Capture(); | |||
|
|||
var inputPipeOptions = new PipeOptions(pool: context.MemoryPool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this introduce a new pipe... That's unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does introduce a new pipe. This part of the PR description describes how we could avoid that.
This does introduce an extra copy similar to what we already have for HTTP/2 input, but the benchmark results clearly show this is worthwhile in order to offload the TLS work to a thread that doesn't block other HTTP/2 streams. We could avoid this copy by updating
ConcurrentPipeWriter
to dispatch calls toFlushAsync
andWriteAsync
. I didn't do that for this initial iteration because we'd want to use a pooledIValueTaskSource
to support this. We'd also want to makeConcurrentPipeWriter
aware of theMaxResponseBufferSize
so it wouldn't always return incomplete ValueTasks for dispatched writes.
I wanted to keep this simple for now though so we can possibly service this. Long term we can do the custom PipeWriter
, or better yet get rid of the _writeLock
altogether using Channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like a lot of time should be invested in what should hopefully be replaced by the best solution (channels).
h3 70x1
h3 was slower than h2 was before in this scenario though it does have a lower max latency. I almost didn't believe this one thinking it must be falling back to h2 or something, but these numbers are consistently low even when I try benchmarking h3 on the PR branch. h2c 70x1 (main)
h2c 70x1 (halter73/30235)
The theoretical memory use will be up to 64KB higher per HTTP/2 connection experiencing TCP backpressure by default. We already have a higher 1MB default limit for buffering the read side at this layer. And no extra memory is used when there is HTTP/2 flow control backpressure. The benchmark results do show increases in the working set and private memory, but not any more than what can be explained by the increased CPU usage.
We'd get rid of the output pipe copy. This is just to get the expensive TLS operations out of the lock. If we do the |
2b74c0d
to
4c2f19f
Compare
613128f
to
ff79d09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of change should stabilize in main for at least one preview release before we consider backporting it.
Co-authored-by: Aditya Mandaleeka <adityamandaleeka@users.noreply.github.com>
This PR changes Http2Connection/Http2FrameWriter so that it dispatches
SslStream.WriteAsync
to the thread pool and releases theHttp2FrameWriter._writeLock
immediately. This improves our "gRPC h2 70x1" scenario (70 streams over one HTTP/2 connection with TLS) by 500% as measured by RPS.The following YARP profile focused on
Http2OutputProducer.WriteDataToPipeAsync
shows how much time is currently spent spinning on the_writeLock
and waiting on TLS operations given many streams writing to a single HTTP/2 connection.This partially addresses #30235. It doesn't go as far as using Channels as suggested in the issue, so HPACK encoding and frame writing/copying still happens with the
_writeLock
. However, the more expensive TLS operations do get dispatched. This allows contending streams to more quickly write their own frames to an output buffer or await without spinning on a lock as frequently.This does introduce an extra copy similar to what we already have for HTTP/2 input, but the benchmark results clearly show this is worthwhile in order to offload the TLS work to a thread that doesn't block other HTTP/2 streams. We could avoid this copy by updating
ConcurrentPipeWriter
to dispatch calls toFlushAsync
andWriteAsync
. I didn't do that for this initial iteration because we'd want to use a pooledIValueTaskSource
to support this. We'd also want to makeConcurrentPipeWriter
aware of theMaxResponseBufferSize
so it wouldn't always return incomplete ValueTasks for dispatched writes.Even when the client can open multiple connections to reduce contention, this improves performance.
I also verified the non-TLS "h2c" performance does not regress.