detect: Make protocol detection more robust #744

olix0r · 2020-11-15T19:11:13Z

The existing protocol detection implementation is somewhat naive: we
simply perform a single read on the socket and use that initial buffer
to try to determine the protocol. This means that if the initial read
only returns a few bytes, we won't actually have enough information to
do protocol detection. Furthermore, because we don't actually read until
a \r\n, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1.

This change fixes this by replacing the Prefix middleware with a
Detect service and trait that does its own buffering as it tries to
read from the socket.

When detection times out, an error is thrown to the server. This change
requires modifying test data to include a trailing newline (since it's
not feasible to close these connections given the current test
infrastructure).

This change increases the default dispatch (and therefore detection)
timeouts to 5s.

Caches are typically wrapped with buffers so that they can be shared across connections. These buffers can be eliminated if the cache itself becomes cloneable; and this is easily achieved with a `std::sync::RwLock`: the cache first obtains a read lock to clone the inner service and a write lock is only obtained when the service is not in the cache. It turns out that the buffers have an important side effect, though: the buffers hide the inner service type, which drastically reduces the amount of memory needed to compile the linkerd2-proxy crate. To replicate this, this change introduces a `stack::BoxNewService` which can be used to hide the type signature of a `NewService`. Benchmarks indicate that this improves p50 latency by 2ms for HTTP/2 traffic at 100 concurrenct requests.

linkerd/app/inbound/src/lib.rs

In preparation for an upcoming change that decouples HTTP detection logic from server initialization, this chagne renames `DetectHttp` to `NewServeHttp` and `AcceptHttp` to `ServeHttp`.

linkerd/app/outbound/src/ingress.rs

linkerd/proxy/http/src/detect.rs

linkerd/proxy/http/src/server.rs

linkerd/proxy/transport/src/detect.rs

This sets up additional layering (like metrics).

This gives busy services time to write to connection pool sockets, rather than aggressively timing out when the initial read is slow.

Our TCP tests write a very short preamble message without a trailing newline and then sit idle (rather than closing the socket). This interferes with the new detection logic; so a line ending is added to trigger protocol detection.

8K is overly conservative and it makes it more difficult for non-HTTP clients to fill the buffer.

The existing protocol detection implementation is somewhat naive: we simply perform a single read on the socket and use that initial buffer to try to determine the protocol. This means that if the initial read only returns a few bytes, we won't actually have enough information to do protocol detection. Furthermore, because we don't actually read until a `\r\n`, we can incorrectly infer that a non-HTTP protocol is HTTP/1.1. This change fixes this by replacing the `Prefix` middleware with a `Detect` service and trait that does its own buffering as it tries to read from the socket. When detection times out, an error is thrown to the server. This change requires modifying test data to include a trailing newline (since it's not feasible to close these connections given the current test infrastructure). This change increases the default dispatch (and therefore detection) timeouts to 5s.

Now that TCP stacks are getting more complex, it's worth differentiating them with a protocol label. This change also renames 'source' stacks to 'server' stacks.

hawkw

This looks good to me overall!

The way the tracing subscriber is set up in the tests is wrong and won't actually do anything, but once that's fixed, this seems good to me.

hawkw · 2020-12-15T18:27:36Z

linkerd/app/integration/src/tests/identity.rs

@@ -25,7 +25,7 @@ async fn nonblocking_identity_detection() {
        .await;
    let proxy = proxy::new().identity(id_svc);

-    let msg1 = "custom tcp hello";
+    let msg1 = "custom tcp hello\n";


nit/TIOLI: do you think it's worth also having some tests with \r\ns in them, just to make sure we're handling that correctly? the logic isn't too complex so i'm not too concerned about it, just a thought...

the http detect tests have specific tests for this sort of thing. i think the integration tests should be as simple as possible to test whatever we expect them to tests.

hawkw · 2020-12-15T18:30:03Z

linkerd/proxy/http/src/detect.rs

+                // Scan up to the first line ending to determine whether the
+                // request is HTTP/1.1. HTTP expects \r\n, so we just look for
+                // any \n to indicate that we can make a determination.
+                if buf[scan_idx..].contains(&b'\n') {


👍 this is much simpler than what you had here yesterday --- and (IIRC) contains) should use a fast SIMD memchr impl when SIMD instructions are available.

linkerd/proxy/http/src/detect.rs

hawkw · 2020-12-15T18:37:51Z

linkerd/proxy/http/src/server.rs

+                        tokio::select! {
+                            res = &mut conn => {
+                                debug!(?res, "The client is shutting down the connection");
+                                res?
+                            }
+                            shutdown = drain.signal() => {
+                                debug!("The process is shutting down the connection");
+                                Pin::new(&mut conn).graceful_shutdown();
+                                shutdown.release_after(conn).await?;
+                            }
+                            () = closed => {
+                                debug!("The stack is tearing down the connection");
+                                Pin::new(&mut conn).graceful_shutdown();
+                                conn.await?;
+                            }


it occurs to me that we probably could stick the select! into an async fn that takes a drain, closed future, and connection future, so that this isn't repeated...but i dunno if that's worth it here, and it was like that before...

let's do this separately -- the type constraints for this function would end up being pretty weighty. And this doesn't trigger the rule of three.

hawkw · 2020-12-15T18:41:38Z

linkerd/proxy/transport/src/detect/mod.rs

+            let t0 = time::Instant::now();
+            let protocol = detect.detect(&mut io, &mut buf).await?;
+            debug!(
+                ?protocol,
+                elapsed = ?(time::Instant::now() - t0),
+                "Detected"
+            );


TIOLI: it seems like we could move the timing here into the timeout layer, since it always takes a t0 anyway to generate the error. that way, we could avoid a second Instant::now(), which may or may not be a syscall. not sure if it matters here since this is a once-per-connection code path rather than a per-request code path...

kleimkuhler

Looks good!

This release features a change to the proxy's cache eviction strategy to ensure that clients (and their load balancers) are reused by new outbound connections. This can dramatically reduce memory consumption, especially for busy HTTP/1.1 clients. Also, the proxy's HTTP detection scheme has been made more robust. Previously, the proxy would perform a only single read to determine whether a TCP stream was HTTP, which could lead to false positives. Now, the proxy reads until at least the first newline, which is what the HTTP parser actually needs to make a proper determination. With this, the default dispatch timeouts have been increased to 5s to accomodate connection pools that may not issue an immediate request. Furthermore, this release includes an upgrade to Tokio v0.3 and its associated ecosystem. --- * update buffers to use Tokio 0.3 MPSC channels (linkerd/linkerd2-proxy#759) * Update the proxy to use Tokio 0.3 (linkerd/linkerd2-proxy#732) * Rename DetectHttp to NewServeHttp (linkerd/linkerd2-proxy#760) * http: more consistent names for body types (linkerd/linkerd2-proxy#761) * io: simplify the `Io` trait (linkerd/linkerd2-proxy#762) * trace: nicer traces in tests, clean up trace configuration (linkerd/linkerd2-proxy#766) * Ensure that services are held as long they are being used (linkerd/linkerd2-proxy#767) * outbound: add stack tests for http (linkerd/linkerd2-proxy#765) * cache: Ensure that actively held services are not evicted (linkerd/linkerd2-proxy#768) * cache: Only spawn a single task per cache entry (linkerd/linkerd2-proxy#770) * test: make integration tests shut up (linkerd/linkerd2-proxy#771) * metrics: Add support for microsecond counters (linkerd/linkerd2-proxy#772) * Add a protocol label to stack metrics (linkerd/linkerd2-proxy#773) * detect: Make protocol detection more robust (linkerd/linkerd2-proxy#744)

olix0r added 7 commits November 14, 2020 20:14

wip: Add a more robust HTTP detection scheme

f5e272e

Decouple HTTP server from detection

4444518

remove defunct test

352e013

fmt

05ec556

Merge branch 'ver/cache-clone' into ver/detect

75b355c

Wire up new detection logic

5ac38a6

olix0r marked this pull request as ready for review November 15, 2020 19:11

olix0r requested a review from a team November 15, 2020 19:11

olix0r marked this pull request as draft November 15, 2020 19:13

olix0r added 4 commits November 15, 2020 20:59

touchup

6c9192e

Fix HTTP/2 detection

21b5719

Add HTTP/1 test case

003dc2c

Remove the unused Peek types

6b0d6a6

Base automatically changed from ver/cache-clone to main November 17, 2020 22:54

olix0r added 6 commits November 19, 2020 01:24

Merge branch 'main' into ver/detect

1a9f571

touchup

1e53db8

Merge branch 'main' into ver/detect

45aa9f2

Merge branch 'main' into ver/detect

87befc2

Robustify parser tests

d92caf5

Cleanup: commentary + clarity

9ebfde5

olix0r marked this pull request as ready for review December 5, 2020 03:07

olix0r commented Dec 5, 2020

View reviewed changes

linkerd/app/inbound/src/lib.rs Outdated Show resolved Hide resolved

olix0r added 2 commits December 5, 2020 03:16

Rename DetectHttp to NewServeHttp

b3a0fc2

In preparation for an upcoming change that decouples HTTP detection logic from server initialization, this chagne renames `DetectHttp` to `NewServeHttp` and `AcceptHttp` to `ServeHttp`.

Merge branch 'ver/http-server' into ver/detect

26ec212

olix0r marked this pull request as draft December 5, 2020 03:21

olix0r changed the base branch from main to ver/http-server December 5, 2020 03:21

olix0r added 3 commits December 5, 2020 03:40

Remove unneeded special-case

a14ebf1

nit

f6bd87a

More testing

b243b0c

olix0r added 5 commits December 12, 2020 23:14

Even greater dispatch timeout for debugging protocol detection timeouts

3515eee

Add diagnostic logs for detection time

7b6c3a8

Revert timeout changes

0c823a6

Make Option part of the Detect interface

daa0025

Merge branch 'main' into ver/detect

55309d0

hawkw reviewed Dec 14, 2020

View reviewed changes

olix0r added 12 commits December 14, 2020 19:54

fixup tests

9c6fd01

fixup test dependency

4f420b5

Make HTTP detection more robust

f649eed

Disable detection timeout errors for now

ca6b7a4

Decouple detect timeout from detect impls

b374e81

This sets up additional layering (like metrics).

Use windows rather than manual buffer indices

d61a0bd

Fixup tracing in detect tests

b9b2ea7

Merge branch 'main' into ver/detect

179dbce

Increase default dispatch timeout

39f69a4

This gives busy services time to write to connection pool sockets, rather than aggressively timing out when the initial read is slow.

Fix tests by adding newline to TCP client message

fde2dbe

Our TCP tests write a very short preamble message without a trailing newline and then sit idle (rather than closing the socket). This interferes with the new detection logic; so a line ending is added to trigger protocol detection.

Lessen detection buffer capacity

bd1e3bf

8K is overly conservative and it makes it more difficult for non-HTTP clients to fill the buffer.

olix0r marked this pull request as ready for review December 15, 2020 17:31

olix0r added 2 commits December 15, 2020 17:47

Add a protocol label to stack metrics

41152e8

Now that TCP stacks are getting more complex, it's worth differentiating them with a protocol label. This change also renames 'source' stacks to 'server' stacks.

Merge branch 'ver/stack-protocol' into ver/detect

b090162

hawkw approved these changes Dec 15, 2020

View reviewed changes

olix0r added 2 commits December 15, 2020 18:48

fix test tracing

6e220d6

Don't set maybe_h1 to false based on partial h2 preface

ffd96ac

kleimkuhler approved these changes Dec 15, 2020

View reviewed changes

olix0r merged commit 1b97e57 into main Dec 15, 2020

olix0r deleted the ver/detect branch December 15, 2020 20:42

olix0r mentioned this pull request Dec 15, 2020

proxy: v2.125.0 linkerd/linkerd2#5392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect: Make protocol detection more robust #744

detect: Make protocol detection more robust #744

olix0r commented Nov 15, 2020 •

edited

Loading

hawkw left a comment

hawkw Dec 15, 2020

olix0r Dec 15, 2020

hawkw Dec 15, 2020

hawkw Dec 15, 2020

olix0r Dec 15, 2020

hawkw Dec 15, 2020

kleimkuhler left a comment

detect: Make protocol detection more robust #744

detect: Make protocol detection more robust #744

Conversation

olix0r commented Nov 15, 2020 • edited Loading

hawkw left a comment

Choose a reason for hiding this comment

hawkw Dec 15, 2020

Choose a reason for hiding this comment

olix0r Dec 15, 2020

Choose a reason for hiding this comment

hawkw Dec 15, 2020

Choose a reason for hiding this comment

hawkw Dec 15, 2020

Choose a reason for hiding this comment

olix0r Dec 15, 2020

Choose a reason for hiding this comment

hawkw Dec 15, 2020

Choose a reason for hiding this comment

kleimkuhler left a comment

Choose a reason for hiding this comment

olix0r commented Nov 15, 2020 •

edited

Loading