-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct gRPC 'max-concurrency exhausted' error messages #847
Conversation
When the proxy hits a fail-fast error when proxying gRPC messages, it reports 'max-concurrency exhausted' in the `grpc-message` response header. This is because failfast was introduced to bound the time a service may be at-concurrency; but now failfast is used everywhere a request may be buffered to bound the time a request can stay in the proxy: if a buffered service fails to become ready within a timeout, the service goes into 'fail-fast' so that requests are failed eagerly until the service is ready. Recently, we've improved failfast errors to indicate which failfast is being triggered. This change replaces the _max-concurrency exhausted_ `grpc-message` with this more-descriptive error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌 thank you!
let code = Code::Unavailable; | ||
headers.insert(GRPC_STATUS, code_header(code)); | ||
headers.insert( | ||
GRPC_MESSAGE, | ||
HeaderValue::from_static("proxy max-concurrency exhausted"), | ||
HeaderValue::from_str(&e.to_string()).unwrap_or_else(|error| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the failfast scopes are static strings, i wonder if we could just add more of the message to them, so that they can be used with HeaderValue::from_static
...string overhead probably doesn't matter that much so this is probably unnecessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, probably more we could yakshave around this; but later.
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
When the proxy hits a fail-fast error when proxying gRPC messages, it
reports 'max-concurrency exhausted' in the
grpc-message
responseheader. This is because failfast was introduced to bound the time a
service may be at-concurrency; but now failfast is used everywhere a
request may be buffered to bound the time a request can stay in the
proxy: if a buffered service fails to become ready within a timeout, the
service goes into 'fail-fast' so that requests are failed eagerly until
the service is ready.
Recently, we've improved failfast errors to indicate which failfast is
being triggered. This change replaces the max-concurrency exhausted
grpc-message
with this more-descriptive error message.