Reduce the cost of flushing #265

Lukasa · 2020-12-03T09:23:47Z

Motivation:

The OutboundFlowControlBuffer is responsible for managing DATA and
HEADERS frames for all streams in order to ensure that we obey HTTP/2
flow control rules, even when users don't. This object is therefore on
the data path for the vast majority of frames, and is extremely
performance sensitive.

Unfortunately, the current algorithm has some nasty performance cliffs.
Right now it has the awkward property of making flushes take an amount
of time linear in the number of streams on the connection, not the
number of streams that had had frames sent on them. The result is that
connections with highly parallel numbers of streams see extremely poor
performance when flushing, a problem that is exacerbated if other parts
of the HTTP/2 stack fail to coalesce those flushes.

We can clean this up in a few ways, but the easiest way is to just keep
track of which streams have done I/O since the last flush. While we're
there, we can make a few other optimisations to ensure we only flush
data on streams we are actually processing, to further reduce the risk
of wasting our time processing unnecessary work.

Modifications:

Add a Set to keep track of where we've done writes since the last
flush.
Streams whose flow control window goes to zero stop being writable.
Cheaper and more accurate calculation of zero-length writes, which
avoids the possibility of awkward stalls with zero-length DATA frames
and END_STREAM.
Stop doing math about the number of flushed bytes, we don't use that
information any more.

Results:

An approximately 10% performance gain on the 100 concurrent streams
server-only benchmark. The relative performance gain shifts with the
number of active streams, but it should be positive in all cases: no regression
is observed on the 1 concurrent stream equivalent test.

Lukasa · 2020-12-03T09:25:17Z

Tests/NIOHTTP2Tests/SimpleClientServerFramePayloadStreamTests.swift

@@ -198,9 +198,14 @@ class SimpleClientServerFramePayloadStreamTests: XCTestCase {
    /// Establish a basic HTTP/2 connection.
    func basicHTTP2Connection(clientSettings: HTTP2Settings = nioDefaultSettings,
                              serverSettings: HTTP2Settings = nioDefaultSettings,
+                              maximumBufferedControlFrames: Int = 10000,


The changes in this file makes these tests run faster: I was sick of waiting for them.

Lukasa · 2020-12-03T09:25:46Z

scripts/cachegrindify.sh

+##
+## SPDX-License-Identifier: Apache-2.0
+##
+##===----------------------------------------------------------------------===##


This is a helper script I've added to do broader-scale cachegrind comparison benchmarking.

glbrntt

I think this looks good.

glbrntt · 2020-12-03T09:46:02Z

Sources/NIOHTTP2/Frame Buffers/OutboundFlowControlBuffer.swift

+
+        if oldWindowSize <= 0 && self.currentWindowSize > 0 {
+            // Window opened. We can now write.
+            return .changed(newValue: true)


Just to check I understand what's going on here: in the case that new window size isn't large enough for all the data we have buffered, we still say we're writable, but this is in the sense of "we have data that can be written on the connection" rather than "we have enough space for the stream channel to produce more writes".

Yes, this is correct. The writability notion here is entirely regarding the HTTP/2 flow control window, and has nothing to do with the stream channel itself. Nothing in this object handles stream channel logic in any way.

glbrntt · 2020-12-03T10:00:05Z

Sources/NIOHTTP2/Frame Buffers/OutboundFlowControlBuffer.swift

+        self.dataBuffer.failAllWrites(error: error)
+    }
+
+    mutating func nextWrite(maxSize: Int) -> (DataBuffer.BufferElement, WritabilityState) {


IIUC this should only be called if we're marked, worth a comment/assertion to that end?

Yeah, seems like a good idea to me.

Motivation: The OutboundFlowControlBuffer is responsible for managing DATA and HEADERS frames for all streams in order to ensure that we obey HTTP/2 flow control rules, even when users don't. This object is therefore on the data path for the vast majority of frames, and is extremely performance sensitive. Unfortunately, the current algorithm has some nasty performance cliffs. Right now it has the awkward property of making flushes take an amount of time linear in the number of streams on the connection, not the number of streams that had had frames sent on them. The result is that connections with highly parallel numbers of streams see extremely poor performance when flushing, a problem that is exacerbated if other parts of the HTTP/2 stack fail to coalesce those flushes. We can clean this up in a few ways, but the easiest way is to just keep track of which streams have done I/O since the last flush. While we're there, we can make a few other optimisations to ensure we only flush data on streams we are actually processing, to further reduce the risk of wasting our time processing unnecessary work. Modifications: - Add a Set to keep track of where we've done writes since the last flush. - Streams whose flow control window goes to zero stop being writable. - Cheaper and more accurate calculation of zero-length writes, which avoids the possibility of awkward stalls with zero-length DATA frames and END_STREAM. - Stop doing math about the number of flushed bytes, we don't use that information any more. Results: An approximately 10% performance gain on the 100 concurrent streams server-only benchmark. The relative performance gain shifts with the number of active streams, but it should be positive in all cases.

Lukasa added 🔨 semver/patch No public API change. area/performance Improvements to performance. labels Dec 3, 2020

Lukasa requested review from glbrntt, Davidde94 and PeterAdams-A December 3, 2020 09:23

Lukasa force-pushed the cb-keep-on-optimising branch from 643e3e7 to 0bedded Compare December 3, 2020 09:27

Lukasa commented Dec 3, 2020

View reviewed changes

Lukasa force-pushed the cb-keep-on-optimising branch 2 times, most recently from bc967c0 to d6d58fe Compare December 3, 2020 10:05

glbrntt approved these changes Dec 3, 2020

View reviewed changes

Lukasa force-pushed the cb-keep-on-optimising branch from d6d58fe to 27e8d6d Compare December 3, 2020 10:17

Lukasa merged commit 37919ba into apple:main Dec 3, 2020

Lukasa deleted the cb-keep-on-optimising branch December 3, 2020 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the cost of flushing #265

Reduce the cost of flushing #265

Lukasa commented Dec 3, 2020

Lukasa Dec 3, 2020

Lukasa Dec 3, 2020

glbrntt left a comment

glbrntt Dec 3, 2020

Lukasa Dec 3, 2020

glbrntt Dec 3, 2020

Lukasa Dec 3, 2020

Reduce the cost of flushing #265

Reduce the cost of flushing #265

Conversation

Lukasa commented Dec 3, 2020

Lukasa Dec 3, 2020

Choose a reason for hiding this comment

Lukasa Dec 3, 2020

Choose a reason for hiding this comment

glbrntt left a comment

Choose a reason for hiding this comment

glbrntt Dec 3, 2020

Choose a reason for hiding this comment

Lukasa Dec 3, 2020

Choose a reason for hiding this comment

glbrntt Dec 3, 2020

Choose a reason for hiding this comment

Lukasa Dec 3, 2020

Choose a reason for hiding this comment