-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect: Make protocol detection more robust #744
Conversation
Caches are typically wrapped with buffers so that they can be shared across connections. These buffers can be eliminated if the cache itself becomes cloneable; and this is easily achieved with a `std::sync::RwLock`: the cache first obtains a read lock to clone the inner service and a write lock is only obtained when the service is not in the cache. It turns out that the buffers have an important side effect, though: the buffers hide the inner service type, which drastically reduces the amount of memory needed to compile the linkerd2-proxy crate. To replicate this, this change introduces a `stack::BoxNewService` which can be used to hide the type signature of a `NewService`. Benchmarks indicate that this improves p50 latency by 2ms for HTTP/2 traffic at 100 concurrenct requests.
In preparation for an upcoming change that decouples HTTP detection logic from server initialization, this chagne renames `DetectHttp` to `NewServeHttp` and `AcceptHttp` to `ServeHttp`.
This sets up additional layering (like metrics).
This gives busy services time to write to connection pool sockets, rather than aggressively timing out when the initial read is slow.
Our TCP tests write a very short preamble message without a trailing newline and then sit idle (rather than closing the socket). This interferes with the new detection logic; so a line ending is added to trigger protocol detection.
8K is overly conservative and it makes it more difficult for non-HTTP clients to fill the buffer.
The existing protocol detection implementation is somewhat naive: we simply perform a single read on the socket and use that initial buffer to try to determine the protocol. This means that if the initial read only returns a few bytes, we won't actually have enough information to do protocol detection. Furthermore, because we don't actually read until a `\r\n`, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1. This change fixes this by replacing the `Prefix` middleware with a `Detect` service and trait that does its own buffering as it tries to read from the socket. When detection times out, an error is thrown to the server. This change requires modifying test data to include a trailing newline (since it's not feasible to close these connections given the current test infrastructure). This change increases the default dispatch (and therefore detection) timeouts to 5s.
Now that TCP stacks are getting more complex, it's worth differentiating them with a protocol label. This change also renames 'source' stacks to 'server' stacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me overall!
The way the tracing
subscriber is set up in the tests is wrong and won't actually do anything, but once that's fixed, this seems good to me.
@@ -25,7 +25,7 @@ async fn nonblocking_identity_detection() { | |||
.await; | |||
let proxy = proxy::new().identity(id_svc); | |||
|
|||
let msg1 = "custom tcp hello"; | |||
let msg1 = "custom tcp hello\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/TIOLI: do you think it's worth also having some tests with \r\n
s in them, just to make sure we're handling that correctly? the logic isn't too complex so i'm not too concerned about it, just a thought...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the http detect tests have specific tests for this sort of thing. i think the integration tests should be as simple as possible to test whatever we expect them to tests.
// Scan up to the first line ending to determine whether the | ||
// request is HTTP/1.1. HTTP expects \r\n, so we just look for | ||
// any \n to indicate that we can make a determination. | ||
if buf[scan_idx..].contains(&b'\n') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is much simpler than what you had here yesterday --- and (IIRC) contains
) should use a fast SIMD memchr
impl when SIMD instructions are available.
tokio::select! { | ||
res = &mut conn => { | ||
debug!(?res, "The client is shutting down the connection"); | ||
res? | ||
} | ||
shutdown = drain.signal() => { | ||
debug!("The process is shutting down the connection"); | ||
Pin::new(&mut conn).graceful_shutdown(); | ||
shutdown.release_after(conn).await?; | ||
} | ||
() = closed => { | ||
debug!("The stack is tearing down the connection"); | ||
Pin::new(&mut conn).graceful_shutdown(); | ||
conn.await?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurs to me that we probably could stick the select!
into an async fn
that takes a drain, closed future, and connection future, so that this isn't repeated...but i dunno if that's worth it here, and it was like that before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do this separately -- the type constraints for this function would end up being pretty weighty. And this doesn't trigger the rule of three.
let t0 = time::Instant::now(); | ||
let protocol = detect.detect(&mut io, &mut buf).await?; | ||
debug!( | ||
?protocol, | ||
elapsed = ?(time::Instant::now() - t0), | ||
"Detected" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI: it seems like we could move the timing here into the timeout layer, since it always takes a t0 anyway to generate the error. that way, we could avoid a second Instant::now()
, which may or may not be a syscall. not sure if it matters here since this is a once-per-connection code path rather than a per-request code path...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
This release features a change to the proxy's cache eviction strategy to ensure that clients (and their load balancers) are reused by new outbound connections. This can dramatically reduce memory consumption, especially for busy HTTP/1.1 clients. Also, the proxy's HTTP detection scheme has been made more robust. Previously, the proxy would perform a only single read to determine whether a TCP stream was HTTP, which could lead to false positives. Now, the proxy reads until at least the first newline, which is what the HTTP parser actually needs to make a proper determination. With this, the default dispatch timeouts have been increased to 5s to accomodate connection pools that may not issue an immediate request. Furthermore, this release includes an upgrade to Tokio v0.3 and its associated ecosystem. --- * update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759) * Update the proxy to use Tokio 0.3 (linkerd/linkerd2-proxy#732) * Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760) * http: more consistent names for body types (linkerd/linkerd2-proxy#761) * io: simplify the `Io` trait (linkerd/linkerd2-proxy#762) * trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766) * Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767) * outbound: add stack tests for http (linkerd/linkerd2-proxy#765) * cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768) * cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770) * test: make integration tests shut up (linkerd/linkerd2-proxy#771) * metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772) * Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773) * detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)
This release features a change to the proxy's cache eviction strategy to ensure that clients (and their load balancers) are reused by new outbound connections. This can dramatically reduce memory consumption, especially for busy HTTP/1.1 clients. Also, the proxy's HTTP detection scheme has been made more robust. Previously, the proxy would perform a only single read to determine whether a TCP stream was HTTP, which could lead to false positives. Now, the proxy reads until at least the first newline, which is what the HTTP parser actually needs to make a proper determination. With this, the default dispatch timeouts have been increased to 5s to accomodate connection pools that may not issue an immediate request. Furthermore, this release includes an upgrade to Tokio v0.3 and its associated ecosystem. --- * update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759) * Update the proxy to use Tokio 0.3 (linkerd/linkerd2-proxy#732) * Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760) * http: more consistent names for body types (linkerd/linkerd2-proxy#761) * io: simplify the `Io` trait (linkerd/linkerd2-proxy#762) * trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766) * Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767) * outbound: add stack tests for http (linkerd/linkerd2-proxy#765) * cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768) * cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770) * test: make integration tests shut up (linkerd/linkerd2-proxy#771) * metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772) * Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773) * detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)
The existing protocol detection implementation is somewhat naive: we
simply perform a single read on the socket and use that initial buffer
to try to determine the protocol. This means that if the initial read
only returns a few bytes, we won't actually have enough information to
do protocol detection. Furthermore, because we don't actually read until
a
\r\n
, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1.This change fixes this by replacing the
Prefix
middleware with aDetect
service and trait that does its own buffering as it tries toread from the socket.
When detection times out, an error is thrown to the server. This change
requires modifying test data to include a trailing newline (since it's
not feasible to close these connections given the current test
infrastructure).
This change increases the default dispatch (and therefore detection)
timeouts to 5s.