Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lazy headers parsing #291

Merged
merged 1 commit into from
Apr 10, 2018
Merged

Conversation

normanmaurer
Copy link
Member

Motivation:

We are currently parsing each header eagly which means we need to convert from bytes to String frequently. The reality is that most of the times the user is not really interested in all the headers and so it is kind of wasteful to do so.

Modification:

Rewrite our internal storage of HTTPHeaders to use a ByteBuffer as internal storage and so only parse headers on demand.

Result:

Less overhead for parsing headers.

@normanmaurer normanmaurer requested review from Lukasa and weissi and removed request for Lukasa April 6, 2018 15:48
@@ -145,8 +145,217 @@ public struct HTTPResponseHead: Equatable {
}
}

fileprivate typealias HTTPHeadersStorage = [String: [(String, String)]] // [lowerCasedName: [(originalCaseName, value)]
/// The Index for a header name or value that points into the underlying `ByteBuffer`.
typealias HTTPHeaderIndex = (start: Int, length: Int)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa we could also use a struct here and store the parsed String into it once we parsed it one time. WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like structs more than tuples for stuff that we're using a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa ok let me do this and also cache the String.

typealias HTTPHeaderIndex = (start: Int, length: Int)

/// Internal storage used for `HTTPHeaders`.
private final class HTTPHeadersByteStorage : CustomStringConvertible {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa We could also just directly merge this with HTTPHeaders I think as ByteBuffer and Array will already ensure COW semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me do this

into.write(buffer: &buf)
} else {
// slow-path....
// TODO: This can still be improved to write as many continous data as possible and just skip over stuff that was removed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa FYI ^^

@@ -33,6 +33,42 @@ extension String {
}
}

private final class HTTPHelloWorldHandler: ChannelInboundHandler {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa @weissi I will remove this before we merge... This was just used for benchmarking

@normanmaurer
Copy link
Member Author

@weissi @Lukasa DONT MERGE YET ;)

@normanmaurer normanmaurer changed the title Implement lazy headers parsing [DONT-MERGE-YET] Implement lazy headers parsing Apr 6, 2018
@normanmaurer
Copy link
Member Author

Also @tanner0101

@@ -197,8 +414,7 @@ public struct HTTPHeaders: CustomStringConvertible {
// recommended.
/// - Parameter value: The header field value to add for the given name.
public mutating func add(name: String, value: String) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa also as a followup I think we should support StaticString

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would that support look like?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically its just faster to write a staticString into a ByteBuffer so I think it make sense to also support it.

@@ -145,8 +145,217 @@ public struct HTTPResponseHead: Equatable {
}
}

fileprivate typealias HTTPHeadersStorage = [String: [(String, String)]] // [lowerCasedName: [(originalCaseName, value)]
/// The Index for a header name or value that points into the underlying `ByteBuffer`.
typealias HTTPHeaderIndex = (start: Int, length: Int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like structs more than tuples for stuff that we're using a lot.

typealias HTTPHeaderIndex = (start: Int, length: Int)

/// Internal storage used for `HTTPHeaders`.
private final class HTTPHeadersByteStorage : CustomStringConvertible {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

/// Internal storage used for `HTTPHeaders`.
private final class HTTPHeadersByteStorage : CustomStringConvertible {
private var buffer: ByteBuffer
fileprivate var indicies: [HTTPHeaderIndex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many 'i's here: you want "indices"

}

/// Returns the number of headers stored.
var count: Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns twice the correct number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

let utf8 = name.utf8
let stringLength = utf8.count
var idx = 0
while idx < self.indicies.count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably more naturally expressed as for index in stride(from: 0, to: self.indices.count, by: 2)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL... thanks

self.storage[name.lowercased()] = nil
if self.storage.remove(name: name) {
// We removed something fomr the internal storage and so it is not continous anymore.
self.continous = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something the internal storage would know about, not the top-level.

In fact, I think it's strictly computable: the storage is contiguous if each element's start index is two bytes after the previous element's end index. That lets you just count it, though naturally you can also spend a word and keep a boolean that keeps track of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa I think I would like to keep the Bool just so we not need to scan the whole array.

buffer.write(staticString: crlf)
}

/// Used for HTTP headers that cannot be joined with commas, e.g. set-cookie.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this correctly handle set-cookie now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our tests included something for it yes... I will double-check

}
}

public func makeIterator() -> Iterator {
return Iterator(wrapping: storage.makeIterator())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea... I like storage more. That said I can revert.

@@ -171,7 +171,7 @@ public class HTTPServerUpgradeHandler: ChannelInboundHandler {
/// The core of the upgrade handling logic.
private func handleUpgrade(ctx: ChannelHandlerContext, request: HTTPRequestHead, requestedProtocols: [String]) -> Bool {
let connectionHeader = Set(request.headers.getCanonicalForm("connection").map { $0.lowercased() })
let allHeaderNames = Set(request.headers.map { $0.name.lowercased() })
let allHeaderNames: Set<String> = Set(request.headers.map { $0.name.lowercased() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need an explicit type now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not... Somehow this happened. Let me revert

@@ -732,7 +732,7 @@ class HTTPUpgradeTestCase: XCTestCase {
let resultString = buffers.map { $0.getString(at: $0.readerIndex, length: $0.readableBytes)! }.joined(separator: "")
assertResponseIs(response: resultString,
expectedResponseLine: "HTTP/1.1 101 Switching Protocols",
expectedResponseHeaders: ["x-upgrade-complete: true", "upgrade: myproto", "connection: upgrade"])
expectedResponseHeaders: ["X-Upgrade-Complete: true", "upgrade: myproto", "connection: upgrade"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we preserve emitted case, or should we explicitly keep our old behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep our old behaviour it will be more "expensive" and I think there is nothing that enforce us to do so... @weissi WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely nothing that forces us to do so, but I guarantee this will break a bunch of people's unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add an optional flag to preserve case, and default it to false. That lets users have a fast path if they want it while preserving the current behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanner0101 @weissi WDYT ? IMHO its fine as it is honestly as you can not depend on a case outcome (as any is impl specific).

Copy link
Contributor

@tanner0101 tanner0101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

return
}
array.forEach {
self.indices.removeSubrange($0...$0 + 1)
Copy link
Contributor

@tanner0101 tanner0101 Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an array of optional header indexes in my implementation so that removed values could just be set to nil without having to re-arrage the array.

removeSubrange:

Complexity: O(n), where n is the length of the collection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanner0101 in my first iteration I did exactly this but it had some performance impact (most likely because of all the extra branching that was needed while iterating though the headers.

With the current code:

✗ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   124.51ms  182.45ms   1.99s    89.44%
    Req/Sec    78.90k    27.18k  210.27k    82.01%
  3065753 requests in 10.05s, 146.19MB read
  Socket errors: connect 0, read 179, write 0, timeout 104
Requests/sec: 304981.98
Transfer/sec:     14.54MB

With using optionals:

✗ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   117.40ms  149.18ms   2.00s    90.74%
    Req/Sec    71.97k    23.04k  165.79k    71.54%
  2854242 requests in 10.06s, 136.10MB read
  Socket errors: connect 0, read 160, write 0, timeout 122
Requests/sec: 283653.24
Transfer/sec:     13.53MB

As you know its all about trade-offs but the question is how often people will remove headers vs trying to access some. Also note its worst case 0(n) as often it should be able to do much better as the sub-range you remove may be at the end of the array.

These numbers can be reproduced all the time.

Also just for the record this is the performance of current master (without this pr at all):

 ✗ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   138.82ms  147.82ms   1.99s    85.21%
    Req/Sec    51.70k    21.46k  109.23k    65.57%
  2042977 requests in 10.09s, 97.42MB read
  Socket errors: connect 0, read 49, write 9, timeout 136
Requests/sec: 202409.96
Transfer/sec:      9.65MB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note with the latest commit (f533f4d) we are now doing:

wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.49ms  148.06ms   1.95s    93.29%
    Req/Sec    81.94k    24.12k  201.77k    76.19%
  3265185 requests in 10.05s, 155.70MB read
  Socket errors: connect 0, read 112, write 0, timeout 114
Requests/sec: 325043.36
Transfer/sec:     15.50MB

Less array access FTW ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow, that's awesome! 👍

@@ -17,10 +17,9 @@ import CNIOHTTPParser

private struct HTTPParserState {
var dataAwaitingState: DataAwaitingState = .messageBegin
var currentHeaders: HTTPHeaders?
var currentHTTPHeaders: [HTTPHeaderIndex] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i would call this currentHeaderIndexes. HTTP is redundant and this doesn't actually contain headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanner0101 I used currentHeaderIndices... thanks for the suggestion :)

@@ -145,8 +145,33 @@ public struct HTTPResponseHead: Equatable {
}
}

fileprivate typealias HTTPHeadersStorage = [String: [(String, String)]] // [lowerCasedName: [(originalCaseName, value)]
/// The Index for a header name or value that points into the underlying `ByteBuffer`.
struct HTTPHeaderIndex {
Copy link
Contributor

@tanner0101 tanner0101 Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be less confusing to read this code (and probably less error prone to modify it) if there were another struct that contained two of these: one for the name and one for the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanner0101 so you suggest having an extra struct which holds two of these structs and store the extra struct in the array (which means removing the % 2 logic) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanner0101 done in f533f4d.... also gave a nice speed up due less array accesses ;)

@normanmaurer
Copy link
Member Author

@swift-nio-bot test this please

}
return []
let result = self[name]
return result.map { $0.split(separator: ",").map { String($0.trimWhitespace()) } }.reduce([], +)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using reduce(into:_:) might save some copies here.

@normanmaurer
Copy link
Member Author

normanmaurer commented Apr 8, 2018

Also implemented lazy uri string creation in 52a6139 .

Before:

wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   114.68ms  178.60ms   1.98s    91.39%
    Req/Sec    86.45k    23.78k  195.93k    80.15%
  3435062 requests in 10.04s, 163.80MB read
  Socket errors: connect 0, read 157, write 0, timeout 101
Requests/sec: 342098.36
Transfer/sec:     16.31MB

After:

➜  ~ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   112.44ms  179.94ms   1.98s    91.59%
    Req/Sec    91.31k    23.18k  222.83k    82.37%
  3619845 requests in 10.04s, 172.61MB read
  Socket errors: connect 0, read 163, write 0, timeout 93
Requests/sec: 360557.67
Transfer/sec:     17.19MB

@normanmaurer
Copy link
Member Author

I will revert 40571c1 before merging as this is only used atm for benchmarking

@normanmaurer
Copy link
Member Author

Also I am not sure if we should do 52a6139 or not... I mean for this kind of benchmarks its a win but I would expect that almost all servers will need to access the URI anyway. For the win see #291 (comment)

@tanner0101 @weissi @Lukasa WDYT ?

@normanmaurer
Copy link
Member Author

normanmaurer commented Apr 9, 2018

New numbers with 72d9aca.

 ✗ wrk -c 256 -d 10s -t 4 -s ~/Downloads/pipeline-many.lua -H "X-Host: SomeValue" -H "Host: swiftnio.io" -H "ThereAreEvenMoreHeaders: AndMoreValues" http://localhost:8888
Running 10s test @ http://localhost:8888
  4 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   109.16ms  188.15ms   1.99s    90.51%
    Req/Sec   100.97k    24.39k  245.34k    86.68%
  4009714 requests in 10.04s, 191.20MB read
  Socket errors: connect 0, read 152, write 0, timeout 80
Requests/sec: 399238.82
Transfer/sec:     19.04MB

@normanmaurer normanmaurer changed the title [DONT-MERGE-YET] Implement lazy headers parsing Implement lazy headers parsing Apr 9, 2018
@@ -63,6 +65,21 @@ public struct HTTPRequestHead: Equatable {
}
}

/// Internal representation of an URI
enum Uri {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called URI.

@@ -95,16 +91,23 @@ private func isChunkedPart(_ headers: HTTPHeaders) -> Bool {
///
/// Note that for HTTP/1.0 if there is no Content-Length then the response should be followed
/// by connection close. We require that the user send that connection close: we don't do it.
private func sanitizeTransportHeaders(hasBody: HTTPMethod.HasBody, headers: inout HTTPHeaders, version: HTTPVersion) {
///
/// Returns true if its chunked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boo, let's use an explicit enum please. :)

/// - Parameter headers: The headers for this HTTP request.
init(version: HTTPVersion, method: HTTPMethod, uri: String, headers: HTTPHeaders) {
init(version: HTTPVersion, method: HTTPMethod, rawUri: Uri, headers: HTTPHeaders) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rawURI.

@@ -24,8 +24,13 @@ public struct HTTPRequestHead: Equatable {
/// The HTTP method for this request.
public let method: HTTPMethod

// Internal representation of the URI.
private let rawUri: Uri
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rawURI.

case string(String)
case byteBuffer(ByteBuffer)

var asString: String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Swift convention here is actually to provide an extension on String that gives it an appropriate constructor from this type.

let stringLength = utf8.count
var array: [String] = []
for header in headers {
if stringLength == header.name.length && self.buffer.equalCaseInsensitiveASCII(seq: utf8, at: header.name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, if we have a proper length check in equalCaseInsensitiveASCII then we can elide this one.

return value.count == header.value.length && self.buffer.getString(at: header.value.start, length: header.value.length) == value
}
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three methods here that have very similar bodies. We should probably rewrite those as either methods that take non-escaping closures or methods that return the header indices so that we can reduce the code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa will review this ... that said while these look kind of similar they are not completely as one is starting from the back and the others not etc.

let utf8 = name.utf8
let stringLength = utf8.count
for header in headers {
if stringLength == header.name.length && self.buffer.equalCaseInsensitiveASCII(seq: utf8, at: header.name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this length check can be elided.


var value = self.buffer.getSlice(at: header.value.start, length: header.value.length)!
into.write(buffer: &value)
into.write(staticString: crlf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now names and values are always contiguous, right? If they are, we can halve this out by writing just one slice:

var header = self.buffer.getSlice(at: header.name.start, length: header.name.length + header.value.length + 4)
into.write(buffer: &header)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa I think this will not work in all cases as we can't be sure it uses : ... It could also just use : (which is legit as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we be sure? Whatever we're sending out was either written there by us, or it was provided by http_parser, but in either case it did parse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa wouldn't it be possible that http_parser handed something which is only separated by : ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but does that matter? The leading whitespace is explicitly optional. RFC 7230 says it right in the ABNF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each header field consists of a case-insensitive field name followed by a colon (":"), optional leading whitespace, the field value, and optional trailing whitespace.

(Emphasis mine)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa and thats exactly the problem ...

self.buffer.getSlice(at: header.name.start, length: header.name.length + header.value.length + 4)

This will only work with : . Without the whitespace it would be +3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sure, but that's ok, just omit that part of the logic and do this:

let fieldLength = (header.value.start + header.value.length) - header.name.start
var header = self.buffer.getSlice(at: header.name.start, length: fieldLength)
into.write(buffer: &header)
into.write(staticString: "\r\n")

That also has the effect of stripping any OWS at the end of the field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa good point.. fixing now

return []

for (b1, b2) in zip(utf8, utf8Other) {
guard b1 & 0xdf == b2 & 0xdf else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this should probably check that the bytes are in the ASCII range.

var currentUri: String?
var currentNameIndex: HTTPHeaderIndex?
var currentHeaders: [HTTPHeader] = []
var currentUri: URI?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're here changing this, can we call this currentURI now?

ctx.write(wrapOutboundOut(.byteBuffer(buffer)), promise: promise)
}

private func isChunkedPart(_ headers: HTTPHeaders) -> Bool {
return headers["transfer-encoding"].contains("chunked")
private enum Chunked {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this enum BodyFraming and have three cases, .chunked, contentLength, and neither.

@@ -63,6 +65,23 @@ public struct HTTPRequestHead: Equatable {
}
}

/// Internal representation of an URI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "a URI"

}

private extension String {
init(_ uri: URI) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might be better to have this label be explicit?

}
return withVeryUnsafeBytes { buffer in
// This should never happens as we control when this is called. Adding and assert to ensure this.
assert(index.start <= self.capacity - index.length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I pretty strongly recommend upgrading this to a precondition, as the consequences of this failing is memory unsafety.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa usually we use asserts where something will happen because of an bug in NIO and precondition when the user can provide the input. Thats why I used assert but if you are strong on this I can change it (it's just not consistent with what we do in other places IMHO). @weissi WDYT ?

var array: [String] = []
for header in headers {
if self.buffer.equalCaseInsensitiveASCII(view: utf8, at: header.name) {
array.append(string(idx: header.value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the explicit self here? Took me a while to work out where this String constructor came from. 😉

str.append("(\"")
str.append(string(idx: h.name))
str.append("\", \"")
str.append(string(idx: h.value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two calls to string have an explicit self?

for header in headers {
if self.buffer.equalCaseInsensitiveASCII(view: utf8, at: header.name) {
if let value = value {
return self.buffer.equalCase(string: value, at: header.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't right: it exits early. That means if you have a header block that consists of [('hello', 'world'), ('hello', 'fish')] and you ask headers.contains(name: 'hello', value: 'fish') this will return false instead of true. Mind adding a test that covers this case, and then fixing this function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa also beside this I used header.name while it should be header.value 🤦‍♂️

return result
}

return result.map { $0.split(separator: ",").map { String($0.trimWhitespace()) } }.reduce([], +)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rebased onto master, which contains an improvement in this code to use flatMap instead of map.reduce.

if let result = storage[queryName] {
return result.map { tuple in tuple.1.split(separator: ",").map { String($0.trimWhitespace()) } }.reduce([], +)
for (b1, b2) in zip(utf8, utf8Other) {
guard b1 & 0xdf == b2 & 0xdf else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about enforcing ASCII here.

storage[keyLower] = (storage[keyLower] ?? []) + [(name, value)]
let nameStart = self.buffer.writerIndex

for c in name.utf8 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa is there a better way ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you can use lazy.map { $0.isASCII }.reduce(true) { $0 && $1 }, but sadly that's only shorter, not necessarily faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... that said we can add an add method that takes StaticString for name and check isASCII directly on it to provide a fast path

Copy link
Member

@weissi weissi Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer name.utf8.contains(where: UInt8.isASCII) or the opposite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa done... also added a StaticString version

ctx.write(wrapOutboundOut(.byteBuffer(buffer)), promise: promise)
}

private func isChunkedPart(_ headers: HTTPHeaders) -> Bool {
return headers["transfer-encoding"].contains("chunked")
private enum BodyFraming {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorrrrrry, but: mind adding a doc comment?

@@ -384,16 +512,18 @@ private extension Substring {

extension HTTPHeaders: Equatable {
public static func ==(lhs: HTTPHeaders, rhs: HTTPHeaders) -> Bool {
if lhs.storage.count != rhs.storage.count {
let lhsNames = Set(lhs.names.map { lhs.string(idx: $0).lowercased() })
let rhsNames = Set(rhs.names.map { rhs.string(idx: $0).lowercased() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but: this makes comparing headers slower in the majority of cases. Most header blocks are non-equal, and most do not contain the same number of headers. Any reason we don't want to keep count around as we did before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa nope let me add it.

if let result = storage[queryName] {
return result.map { tuple in tuple.1.split(separator: ",").map { String($0.trimWhitespace()) } }.reduce([], +)
for (b1, b2) in zip(self.utf8, utf8Other) {
guard b1 & 0xdf == b2 & 0xdf else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method still does not enforce ASCII-ness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to remove this method... its not used anymore.

public init(_ headers: [(String, String)] = []) {
/// - parameters
/// - headers: An initial set of headers to use to populate the header block.
/// - allocator: The allocator to use to allocate the underyling storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: underlying

return strArray.joined()
}

/// Constructor used by our decoder to construct headers without the need of convert bytes to string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/convert/converting/

@@ -384,16 +496,21 @@ private extension Substring {

extension HTTPHeaders: Equatable {
public static func ==(lhs: HTTPHeaders, rhs: HTTPHeaders) -> Bool {
if lhs.storage.count != rhs.storage.count {
guard lhs.names.count == rhs.headers.count else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not comparing against the same property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brainfart...

/// ASCII string. For future-proofing with HTTP/2 lowercase header names are strongly
// recommended.
/// - Parameter value: The header field value to add for the given name.
public mutating func add(name: StaticString, value: String) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi @Lukasa should we change name, to staticName ?

/// ASCII string. For future-proofing with HTTP/2 lowercase header names are strongly
// recommended.
/// - Parameter value: The header field value to add for the given name.
public mutating func add(name: StaticString, value: String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work the way you want: I quickly asked @weissi and he reckons that the compiler won't ever select this method over its other overload. Can you try replacing the body of this method with fatalError and running the tests to see if you ever actually execute this code? If you don't, this probably needs a different set of labels to make the compiler actually choose it.

func validateAndWrite(buffer: inout ByteBuffer) -> Int {
switch self {
case .staticString(let string):
precondition(string.isASCII, "name must be ASCII")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is StaticString.isASCII coming from? Why can't we use it on string below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a few small notes here.

case .yes, .unlikely:
/* leave alone */
()
return headers["transfer-encoding"].first == "chunked" ? .chunked : .neither
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong. For HTTP/1.0, there is no chunked transfer encoding. This should look for content-length, not chunked TE. I appreciate it won't affect the outcome of the code, but we should aim to get this right in case we want to use this elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you suggest checking for content length only ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, chunked + HTTP/1.0 is not spec-valid.

return false
}
return withVeryUnsafeBytes { buffer in
// This should never happens as we control when this is called. Adding and assert to ensure this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and/an/

guard byte.isASCII && address.advanced(by: index.start + idx).pointee & 0xdf == byte & 0xdf else {
return false
}
idx += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop can be cleaned up by using for (idx, byte) in view.enumerated(), which can avoid using the explicit index.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm happy. Just @weissi to review.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 9, 2018

Also needs a rebase, natch.

@normanmaurer normanmaurer force-pushed the http_work branch 2 times, most recently from 624762c to d4007f0 Compare April 10, 2018 09:44
Copy link
Member

@weissi weissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that generally looks really great! left a few comments

var currentHeaders: HTTPHeaders?
var currentUri: String?
var currentNameIndex: HTTPHeaderIndex?
var currentHeaders: [HTTPHeader] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe 'guess' a usual number of headers and put an array here that has reservedCapacity?

self.currentHeaders = nil
self.currentUri = nil
self.currentNameIndex = nil
self.currentHeaders = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need to replicate that here or make it come from some constant

@@ -78,12 +78,15 @@ private func writeHead(wrapOutboundOut: (IOData) -> NIOAny, writeStartLine: (ino

var buffer = ctx.channel.allocator.buffer(capacity: 256)
writeStartLine(&buffer)
headers.write(buffer: &buffer)
headers.write(into: &buffer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming that's not public API, shouldn't we follow the usual ByteBuffer API and make it

buffer.write(headers: headers)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do a follow up

}

public var description: String {
var strArray: [String] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just shove all of that in a [String: String] dictionary and cause that .description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

array.forEach {
self.headers.remove(at: $0)
}
self.continuous = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra PR is better I think


fileprivate init(wrapping: HTTPHeadersStorage.Iterator) {
self.storageIterator = wrapping
fileprivate init(headerParts: Array<(String, String)>.Iterator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write [(String, String)] instead of Array<...>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi [(String, String)].Iterator does not compile...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works for me:

  5> func foo(headerParts: Array<(String, String)>.Iterator) {}
  6> 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, wrong paste

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  6> print([(String, String)].Iterator.self) 
IndexingIterator<Array<(String, String)>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanmaurer OMG, apologies, it does work anywhere but in function signatures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weissi so all good here?

Copy link
Member

@weissi weissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice one, great work1

Motivation:

We are currently parsing each header eagly which means we need to convert from bytes to String frequently. The reality is that most of the times the user is not really interested in all the headers and so it is kind of wasteful to do so.

Modification:

Rewrite our internal storage of HTTPHeaders to use a ByteBuffer as internal storage and so only parse headers on demand.

Result:

Less overhead for parsing headers.
@normanmaurer
Copy link
Member Author

@swift-nio-bot test this please

1 similar comment
@normanmaurer
Copy link
Member Author

@swift-nio-bot test this please

@normanmaurer normanmaurer merged commit 48cef4a into apple:master Apr 10, 2018
@normanmaurer normanmaurer deleted the http_work branch April 10, 2018 12:58
@normanmaurer
Copy link
Member Author

And merged! :)

@normanmaurer normanmaurer self-assigned this Apr 10, 2018
@normanmaurer normanmaurer added this to the 1.4.0 milestone Apr 10, 2018
@Lukasa Lukasa added the 🆕 semver/minor Adds new public API. label Apr 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants