Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the cost of flushing #265

Merged
merged 1 commit into from
Dec 3, 2020
Merged

Conversation

Lukasa
Copy link
Contributor

@Lukasa Lukasa commented Dec 3, 2020

Motivation:

The OutboundFlowControlBuffer is responsible for managing DATA and
HEADERS frames for all streams in order to ensure that we obey HTTP/2
flow control rules, even when users don't. This object is therefore on
the data path for the vast majority of frames, and is extremely
performance sensitive.

Unfortunately, the current algorithm has some nasty performance cliffs.
Right now it has the awkward property of making flushes take an amount
of time linear in the number of streams on the connection, not the
number of streams that had had frames sent on them. The result is that
connections with highly parallel numbers of streams see extremely poor
performance when flushing, a problem that is exacerbated if other parts
of the HTTP/2 stack fail to coalesce those flushes.

We can clean this up in a few ways, but the easiest way is to just keep
track of which streams have done I/O since the last flush. While we're
there, we can make a few other optimisations to ensure we only flush
data on streams we are actually processing, to further reduce the risk
of wasting our time processing unnecessary work.

Modifications:

  • Add a Set to keep track of where we've done writes since the last
    flush.
  • Streams whose flow control window goes to zero stop being writable.
  • Cheaper and more accurate calculation of zero-length writes, which
    avoids the possibility of awkward stalls with zero-length DATA frames
    and END_STREAM.
  • Stop doing math about the number of flushed bytes, we don't use that
    information any more.

Results:

An approximately 10% performance gain on the 100 concurrent streams
server-only benchmark. The relative performance gain shifts with the
number of active streams, but it should be positive in all cases: no regression
is observed on the 1 concurrent stream equivalent test.

@Lukasa Lukasa added 🔨 semver/patch No public API change. area/performance Improvements to performance. labels Dec 3, 2020
@Lukasa Lukasa force-pushed the cb-keep-on-optimising branch from 643e3e7 to 0bedded Compare December 3, 2020 09:27
@@ -198,9 +198,14 @@ class SimpleClientServerFramePayloadStreamTests: XCTestCase {
/// Establish a basic HTTP/2 connection.
func basicHTTP2Connection(clientSettings: HTTP2Settings = nioDefaultSettings,
serverSettings: HTTP2Settings = nioDefaultSettings,
maximumBufferedControlFrames: Int = 10000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file makes these tests run faster: I was sick of waiting for them.

##
## SPDX-License-Identifier: Apache-2.0
##
##===----------------------------------------------------------------------===##
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a helper script I've added to do broader-scale cachegrind comparison benchmarking.

@Lukasa Lukasa force-pushed the cb-keep-on-optimising branch 2 times, most recently from bc967c0 to d6d58fe Compare December 3, 2020 10:05
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good.


if oldWindowSize <= 0 && self.currentWindowSize > 0 {
// Window opened. We can now write.
return .changed(newValue: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check I understand what's going on here: in the case that new window size isn't large enough for all the data we have buffered, we still say we're writable, but this is in the sense of "we have data that can be written on the connection" rather than "we have enough space for the stream channel to produce more writes".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is correct. The writability notion here is entirely regarding the HTTP/2 flow control window, and has nothing to do with the stream channel itself. Nothing in this object handles stream channel logic in any way.

self.dataBuffer.failAllWrites(error: error)
}

mutating func nextWrite(maxSize: Int) -> (DataBuffer.BufferElement, WritabilityState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this should only be called if we're marked, worth a comment/assertion to that end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like a good idea to me.

Motivation:

The OutboundFlowControlBuffer is responsible for managing DATA and
HEADERS frames for all streams in order to ensure that we obey HTTP/2
flow control rules, even when users don't. This object is therefore on
the data path for the vast majority of frames, and is extremely
performance sensitive.

Unfortunately, the current algorithm has some nasty performance cliffs.
Right now it has the awkward property of making flushes take an amount
of time linear in the number of streams on the connection, not the
number of streams that had had frames sent on them. The result is that
connections with highly parallel numbers of streams see extremely poor
performance when flushing, a problem that is exacerbated if other parts
of the HTTP/2 stack fail to coalesce those flushes.

We can clean this up in a few ways, but the easiest way is to just keep
track of which streams have done I/O since the last flush. While we're
there, we can make a few other optimisations to ensure we only flush
data on streams we are actually processing, to further reduce the risk
of wasting our time processing unnecessary work.

Modifications:

- Add a Set to keep track of where we've done writes since the last
   flush.
- Streams whose flow control window goes to zero stop being writable.
- Cheaper and more accurate calculation of zero-length writes, which
   avoids the possibility of awkward stalls with zero-length DATA frames
   and END_STREAM.
- Stop doing math about the number of flushed bytes, we don't use that
   information any more.

Results:

An approximately 10% performance gain on the 100 concurrent streams
server-only benchmark. The relative performance gain shifts with the
number of active streams, but it should be positive in all cases.
@Lukasa Lukasa force-pushed the cb-keep-on-optimising branch from d6d58fe to 27e8d6d Compare December 3, 2020 10:17
@Lukasa Lukasa merged commit 37919ba into apple:main Dec 3, 2020
@Lukasa Lukasa deleted the cb-keep-on-optimising branch December 3, 2020 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Improvements to performance. 🔨 semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants