Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect: Make protocol detection more robust #744

Merged
merged 66 commits into from
Dec 15, 2020
Merged

detect: Make protocol detection more robust #744

merged 66 commits into from
Dec 15, 2020

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Nov 15, 2020

The existing protocol detection implementation is somewhat naive: we
simply perform a single read on the socket and use that initial buffer
to try to determine the protocol. This means that if the initial read
only returns a few bytes, we won't actually have enough information to
do protocol detection. Furthermore, because we don't actually read until
a \r\n, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1.

This change fixes this by replacing the Prefix middleware with a
Detect service and trait that does its own buffering as it tries to
read from the socket.

When detection times out, an error is thrown to the server. This change
requires modifying test data to include a trailing newline (since it's
not feasible to close these connections given the current test
infrastructure).

This change increases the default dispatch (and therefore detection)
timeouts to 5s.

Caches are typically wrapped with buffers so that they can be shared
across connections. These buffers can be eliminated if the cache itself
becomes cloneable; and this is easily achieved with a
`std::sync::RwLock`: the cache first obtains a read lock to clone the
inner service and a write lock is only obtained when the service is not
in the cache.

It turns out that the buffers have an important side effect, though: the
buffers hide the inner service type, which drastically reduces the
amount of memory needed to compile the linkerd2-proxy crate. To
replicate this, this change introduces a `stack::BoxNewService` which
can be used to hide the type signature of a `NewService`.

Benchmarks indicate that this improves p50 latency by 2ms for HTTP/2
traffic at 100 concurrenct requests.
@olix0r olix0r marked this pull request as ready for review November 15, 2020 19:11
@olix0r olix0r requested a review from a team November 15, 2020 19:11
@olix0r olix0r marked this pull request as draft November 15, 2020 19:13
Base automatically changed from ver/cache-clone to main November 17, 2020 22:54
@olix0r olix0r marked this pull request as ready for review December 5, 2020 03:07
linkerd/app/inbound/src/lib.rs Outdated Show resolved Hide resolved
In preparation for an upcoming change that decouples HTTP detection
logic from server initialization, this chagne renames `DetectHttp` to
`NewServeHttp` and `AcceptHttp` to `ServeHttp`.
@olix0r olix0r marked this pull request as draft December 5, 2020 03:21
@olix0r olix0r changed the base branch from main to ver/http-server December 5, 2020 03:21
linkerd/app/outbound/src/ingress.rs Show resolved Hide resolved
linkerd/proxy/http/src/detect.rs Outdated Show resolved Hide resolved
linkerd/proxy/http/src/detect.rs Show resolved Hide resolved
linkerd/proxy/http/src/detect.rs Outdated Show resolved Hide resolved
linkerd/proxy/http/src/server.rs Show resolved Hide resolved
linkerd/proxy/http/src/server.rs Show resolved Hide resolved
linkerd/proxy/transport/src/detect.rs Outdated Show resolved Hide resolved
This sets up additional layering (like metrics).
This gives busy services time to write to connection pool sockets,
rather than aggressively timing out when the initial read is slow.
Our TCP tests write a very short preamble message without a trailing
newline and then sit idle (rather than closing the socket). This
interferes with the new detection logic; so a line ending is added to
trigger protocol detection.
8K is overly conservative and it makes it more difficult for non-HTTP
clients to fill the buffer.
The existing protocol detection implementation is somewhat naive: we
simply perform a single read on the socket and use that initial buffer
to try to determine the protocol. This means that if the initial read
only returns a few bytes, we won't actually have enough information to
do protocol detection. Furthermore, because we don't actually read until
a `\r\n`, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1.

This change fixes this by replacing the `Prefix` middleware with a
`Detect` service and trait that does its own buffering as it tries to
read from the socket.

When detection times out, an error is thrown to the server. This change
requires modifying test data to include a trailing newline (since it's
not feasible to close these connections given the current test
infrastructure).

This change increases the default dispatch (and therefore detection)
timeouts to 5s.
@olix0r olix0r marked this pull request as ready for review December 15, 2020 17:31
Now that TCP stacks are getting more complex, it's worth differentiating
them with a protocol label. This change also renames 'source' stacks to
'server' stacks.
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall!

The way the tracing subscriber is set up in the tests is wrong and won't actually do anything, but once that's fixed, this seems good to me.

@@ -25,7 +25,7 @@ async fn nonblocking_identity_detection() {
.await;
let proxy = proxy::new().identity(id_svc);

let msg1 = "custom tcp hello";
let msg1 = "custom tcp hello\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/TIOLI: do you think it's worth also having some tests with \r\ns in them, just to make sure we're handling that correctly? the logic isn't too complex so i'm not too concerned about it, just a thought...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the http detect tests have specific tests for this sort of thing. i think the integration tests should be as simple as possible to test whatever we expect them to tests.

// Scan up to the first line ending to determine whether the
// request is HTTP/1.1. HTTP expects \r\n, so we just look for
// any \n to indicate that we can make a determination.
if buf[scan_idx..].contains(&b'\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is much simpler than what you had here yesterday --- and (IIRC) contains) should use a fast SIMD memchr impl when SIMD instructions are available.

linkerd/proxy/http/src/detect.rs Outdated Show resolved Hide resolved
Comment on lines +144 to +158
tokio::select! {
res = &mut conn => {
debug!(?res, "The client is shutting down the connection");
res?
}
shutdown = drain.signal() => {
debug!("The process is shutting down the connection");
Pin::new(&mut conn).graceful_shutdown();
shutdown.release_after(conn).await?;
}
() = closed => {
debug!("The stack is tearing down the connection");
Pin::new(&mut conn).graceful_shutdown();
conn.await?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it occurs to me that we probably could stick the select! into an async fn that takes a drain, closed future, and connection future, so that this isn't repeated...but i dunno if that's worth it here, and it was like that before...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do this separately -- the type constraints for this function would end up being pretty weighty. And this doesn't trigger the rule of three.

Comment on lines +103 to +109
let t0 = time::Instant::now();
let protocol = detect.detect(&mut io, &mut buf).await?;
debug!(
?protocol,
elapsed = ?(time::Instant::now() - t0),
"Detected"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: it seems like we could move the timing here into the timeout layer, since it always takes a t0 anyway to generate the error. that way, we could avoid a second Instant::now(), which may or may not be a syscall. not sure if it matters here since this is a once-per-connection code path rather than a per-request code path...

Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@olix0r olix0r merged commit 1b97e57 into main Dec 15, 2020
@olix0r olix0r deleted the ver/detect branch December 15, 2020 20:42
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 15, 2020
This release features a change to the proxy's cache eviction strategy to
ensure that clients (and their load balancers) are reused by new
outbound connections. This can dramatically reduce memory consumption,
especially for busy HTTP/1.1 clients.

Also, the proxy's HTTP detection scheme has been made more robust.
Previously, the proxy would perform a only single read to determine
whether a TCP stream was HTTP, which could lead to false positives. Now,
the proxy reads until at least the first newline, which is what the HTTP
parser actually needs to make a proper determination. With this, the
default dispatch timeouts have been increased to 5s to accomodate
connection pools that may not issue an immediate request.

Furthermore, this release includes an upgrade to Tokio v0.3 and its
associated ecosystem.

---

* update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759)
* Update the proxy to use Tokio 0.3  (linkerd/linkerd2-proxy#732)
* Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760)
* http: more consistent names for body types (linkerd/linkerd2-proxy#761)
* io: simplify the `Io` trait (linkerd/linkerd2-proxy#762)
* trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766)
* Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767)
* outbound: add stack tests for http (linkerd/linkerd2-proxy#765)
* cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768)
* cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770)
* test: make integration tests shut up (linkerd/linkerd2-proxy#771)
* metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772)
* Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773)
* detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 15, 2020
This release features a change to the proxy's cache eviction strategy to
ensure that clients (and their load balancers) are reused by new
outbound connections. This can dramatically reduce memory consumption,
especially for busy HTTP/1.1 clients.

Also, the proxy's HTTP detection scheme has been made more robust.
Previously, the proxy would perform a only single read to determine
whether a TCP stream was HTTP, which could lead to false positives. Now,
the proxy reads until at least the first newline, which is what the HTTP
parser actually needs to make a proper determination. With this, the
default dispatch timeouts have been increased to 5s to accomodate
connection pools that may not issue an immediate request.

Furthermore, this release includes an upgrade to Tokio v0.3 and its
associated ecosystem.

---

* update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759)
* Update the proxy to use Tokio 0.3  (linkerd/linkerd2-proxy#732)
* Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760)
* http: more consistent names for body types (linkerd/linkerd2-proxy#761)
* io: simplify the `Io` trait (linkerd/linkerd2-proxy#762)
* trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766)
* Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767)
* outbound: add stack tests for http (linkerd/linkerd2-proxy#765)
* cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768)
* cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770)
* test: make integration tests shut up (linkerd/linkerd2-proxy#771)
* metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772)
* Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773)
* detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants