Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTTP/2 performance by using Channels #30235

Closed
davidfowl opened this issue Feb 17, 2021 · 6 comments · Fixed by #40925
Closed

Improve HTTP/2 performance by using Channels #30235

davidfowl opened this issue Feb 17, 2021 · 6 comments · Fixed by #40925
Assignees
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel HTTP2 Perf
Milestone

Comments

@davidfowl
Copy link
Member

davidfowl commented Feb 17, 2021

Today Kestrel locks and copies buffers into the connection pipe when multiplexing output from multiple streams. We could improve this by using a channel to queue writes to the pipe instead of locking (See

for an example).

This samples shows the difference between a SemaphoreSlim and a Channel<byte[]> and the performance difference is almost 3x with 0 allocations https://gist.github.com/davidfowl/8af7b6fa21df0fe1f150fb5cfeafa8f7.

There was still lock contention in Channels itself but it was pretty optimized.

We should prototype and measure this change.

cc @JamesNK @halter73 @Tratcher

@davidfowl davidfowl assigned davidfowl and unassigned davidfowl Feb 17, 2021
@davidfowl davidfowl added the Perf label Feb 17, 2021
@Tratcher Tratcher changed the title Improve HTTP/2 performance by using Improve HTTP/2 performance by using Channels Feb 17, 2021
@BrennanConroy BrennanConroy added this to the Next sprint planning milestone Feb 17, 2021
@ghost
Copy link

ghost commented Feb 17, 2021

Thanks for contacting us.
We're moving this issue to the Next sprint planning milestone for future evaluation / consideration. We will evaluate the request when we are planning the work for the next milestone. To learn more about what to expect next and how this issue will be handled you can read more about our triage process here.

@JamesNK
Copy link
Member

JamesNK commented Mar 11, 2021

Copied from #30829

The HTTP/2 frame writer uses lock to ensure only one stream can write the connection at a time. For example, Http2FrameWriter.WriteDataAsync.

When a connection has many streams with frequent writes, there is a lot of contention on the lock. Profiling shows high CPU usage from threads fighting over the lock.

A potential improvement would be to change the writer to use a producer/consumer queue using Channel<T>. The streams add write operations to the queue and a single consumer loop is responsible for writing frames to the connection.

Today:

public ValueTask<FlushResult> WriteDataAsync(byte[] data)
{
    lock (_writeLock)
    {
        _writer.Write(data);
        return _writer.FlushAsync()
    }
}

Future:

public ValueTask WriteDataAsync(byte[] data)
{
    var operation = SetupWriteData(data);
    _channelWriter.TryWrite(operation);
    return operation.WaitForCompletedAsync();
}

// Consumer loop that reads operations from channel
private async ValueTask WriterConsumer()
{
    while (true)
    {
        if (!await _channelReader.WaitToReadAsync())
        {
            return;
        }

        if (_channelReader.TryRead(out T item))
        {
            switch (item.OperationType)
            {
                case OperationType.WriteData:
                    _writer.Write(item.Data);
                    await _writer.FlushAsync();
                    item.Complete();
                    break;
                case OperationType.WriteHeaders:
                    // etc..
                    break;
            }
        }
    }
}

@halter73
Copy link
Member

Note: The HTTP/2 FlowControl logic relies on being synchronized by the _writeLock.

public bool TryUpdateConnectionWindow(int bytes)
{
lock (_writeLock)
{
return _connectionOutputFlowControl.TryUpdateWindow(bytes);
}
}
public bool TryUpdateStreamWindow(StreamOutputFlowControl flowControl, int bytes)
{
lock (_writeLock)
{
return flowControl.TryUpdateWindow(bytes);
}
}

We can instead move this logic into WriterConsumer() for synchronization, but that means TryUpdateStreamWindow will need to be an operation that can be written to the channel. That also means what looks like a single write to WriteDataAsync() could take several iterations of the WriterConsumer() loop to complete as the writes wait for flow control availability.

I don't think this causes a real problem for this approach. It's just a slight complication. I really like using Channels for this.

@ghost
Copy link

ghost commented Jul 16, 2021

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

@JamesNK
Copy link
Member

JamesNK commented Aug 23, 2021

Evidence of this issue from gRPC benchmarks. They show:

  • HTTP/3 is fast.
  • H2C is slow.
  • H2 (HTTP/2 + TLS) is even worse. I'm guessing it causes more time to be spent inside the lock and further increases thread contention. CPU usage on the server and client are both low.

image

(h2 RPS is the yellow dot at the bottom)

@davidfowl
Copy link
Member Author

Closing this as fixed.

@ghost ghost locked as resolved and limited conversation to collaborators May 16, 2022
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions feature-kestrel HTTP2 Perf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants