Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias HTTP detection towards availability #894

Merged
merged 2 commits into from
Feb 2, 2021
Merged

Bias HTTP detection towards availability #894

merged 2 commits into from
Feb 2, 2021

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Feb 2, 2021

1b97e57 changed HTTP detection to bias towards correctness. Instead of
performing a single read, the HTTP detection module reads data until a
newline is encountered so that a dispositive determination can be made.
Practically, this causes many non-HTTP protocols to fail with a
connection timeout because a newline is never transmitted.

This change reverts the HTTP detection logic to perform a single read
and to use that data to make a determination. A minimum of 14B is
required to handle a connection as HTTP.

@olix0r olix0r requested a review from a team February 2, 2021 01:34
1b97e57 changed HTTP detection to bias towards correctness. Instead of
performing a single read, the HTTP detection module reads data until a
newline is encountered so that a dispositive determination can be made.
Practically, this causes many non-HTTP protocols to fail with a
connection timeout because a newline is never transmitted.

This change reverts the HTTP detection logic to perform a single read
and to use that data to make a determination. A minimum of 14B is
required to handle a connection as HTTP.
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change reverts the HTTP detection logic to perform a single read
and to use that data to make a determination. A minimum of 14B is
required to handle a connection as HTTP.

As we discussed offline, it seems like a potential compromise (which would require a somewhat bigger code change) would be to make multiple reads, but fall back to transparent forwarding when detection times out, rather than failing the connection. This would let us avoid false negatives in the case where a single read doesn't return enough data to make a protocol determination, but would still bias for availability. I'm not sure whether or not this is necessary though. It seems pretty unlikely that we'll need multiple reads to detect HTTP.

We may want to do this in a follow-up regardless, but this change seems good to me for now.

linkerd/proxy/http/src/detect.rs Outdated Show resolved Hide resolved
@olix0r olix0r merged commit c49bca1 into main Feb 2, 2021
@olix0r olix0r deleted the ver/http-detect branch February 2, 2021 19:01
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 3, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Feb 3, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Mar 23, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Apr 21, 2021
This release changes HTTP protocol detection to prevent timeout errors
in two ways:

1. HTTP detection no longer blocks until a newline is read. We've
   reverted to relying on a single read to make a determination.
2. Detection timeouts are no longer terminal. When a timeout is
   encountered, we continue forwarding the connection as an opaque TCP
   connection.

These changes may lead to false-negatives--we may fail to detect some
HTTP streams--but it should prevent many avoidable detection errors.

This release also makes improvements for multicluster gateways,
improving caching so that profile lookups are only performed
once per target service.

Diagnostic `stack_*` metrics have been moved so that they track
underlying services, ignoring fail-fast. This should help us get better
insights into services that are in failfast.

Finally, the opencensus exporter has been improved to ensure that trace
events are flushed if the trace buffer is not filled within a timeout.

---

* actions: Update actions to use full SHAs (linkerd/linkerd2-proxy#885)
* http: Parameterize normalize_ur::DefaultAuthority (linkerd/linkerd2-proxy#886)
* http: Parameterize the HTTP server (linkerd/linkerd2-proxy#887)
* opencensus: rewrite span exporter using async/await (linkerd/linkerd2-proxy#789)
* Update http::Insert to use `Param` (linkerd/linkerd2-proxy#889)
* Update crate dependencies (linkerd/linkerd2-proxy#892)
* stack: Make the router fallible (linkerd/linkerd2-proxy#888)
* Track stack metrics within failfast (linkerd/linkerd2-proxy#891)
* outbound: Avoid building balancers when no concrete name (linkerd/linkerd2-proxy#890)
* inbound: Cache HTTP gateways per destination (linkerd/linkerd2-proxy#893)
* Reorganize the gateway crate (linkerd/linkerd2-proxy#897)
* Bias HTTP detection towards availability (linkerd/linkerd2-proxy#894)
* inbound: Use ALPN to determine transport header (linkerd/linkerd2-proxy#895)
* detect: Return unknown protocol on detection timeout (linkerd/linkerd2-proxy#896)
* Extract protocol detection into the gateway crate (linkerd/linkerd2-proxy#898)

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants