Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate socket-level errors with a scope #852

Merged
merged 10 commits into from
Jan 19, 2021
Merged

Annotate socket-level errors with a scope #852

merged 10 commits into from
Jan 19, 2021

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Jan 16, 2021

We frequently encounter socket errors like:

Connection closed error=Transport endpoint is not connected (os error 107)

While the handler is generally scoped with tracing context to tell us
information about the server-side connection, we lack information about
which socket actually failed. For every "logical" TCP stream, we
generally have two concrete connections: One on the server-side of the
proxy and another on the client-side.

This change introduces a new io::ScopedIo wrapper type that annotates
all I/O error messages with a client or server prefix. Then, the
proxy's clients and servers are wrapped with this type so that all
socket-level error messages are disambiguated.

`tls::accept` has a `Detectable` trait which allows TLS detection to use
`TcpStream::peek`; but this is inflexible if we want to wrap `TcpStream`
with any additional behavior and it limits our ability to write tests
for this module.

This change introduces a new `io::Peek` trait to model
`TcpStream::peek` and removes the `tls::accept::Detectable` trait.
@olix0r olix0r force-pushed the ver/io-error-info branch from d500470 to 7dfe63a Compare January 17, 2021 18:23
@olix0r olix0r changed the base branch from main to ver/io-peek January 17, 2021 18:23
@olix0r olix0r marked this pull request as draft January 17, 2021 18:23
We frequently encounter socket errors like:

    Connection closed error=Transport endpoint is not connected (os error 107)

While the handler is generally scoped with tracing context to tell us
information about the server-side connection, we lack information about
which socket actually failed. For every "logical" TCP stream, we
generally have two concrete connections: One on the server-side of the
proxy and another on the client-side.

This change introduces a new `io::ScopedIo` wrapper type that annotates
all I/O error messages with a `client` or `server` prefix. Then, the
proxy's clients and servers are wrapped with this type so that all
socket-level error messages are disambiguated.
@olix0r olix0r force-pushed the ver/io-error-info branch from 7dfe63a to c85454f Compare January 17, 2021 18:24
Base automatically changed from ver/io-peek to main January 19, 2021 21:27
@olix0r olix0r marked this pull request as ready for review January 19, 2021 21:37
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm modulo a couple nits

linkerd/io/src/scoped.rs Outdated Show resolved Hide resolved
linkerd/io/src/scoped.rs Outdated Show resolved Hide resolved
linkerd/io/src/scoped.rs Show resolved Hide resolved
@olix0r olix0r merged commit 1e98a5b into main Jan 19, 2021
@olix0r olix0r deleted the ver/io-error-info branch January 19, 2021 22:31
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jan 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jan 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Mar 23, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Apr 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants