-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Reorder metrics labels #856
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a superficial change that moves the `direction` label before the `authority` label on HTTP endpoint metrics.
olix0r
pushed a commit
that referenced
this pull request
Jan 19, 2021
This branch changes the metrics integration tests so that we no longer make assertions about metrics by searching the entire metrics scrape for a specific string. Instead, we now have a simple builder API for constructing a type that tries to parse the metrics scrape to find individual metrics with certain label combinations and/or values. Advantages of the new approach include: - it's not ordering dependent --- changing the label ordering doesn't change whether or not the match succeeds. for example, changes like #856 won't break the tests. - the presence of *additional* labels doesn't break the assertions (although we may want to add an "exact" matching mode in case we *do* want to assert that other labels are not present). - metrics can easily be matched with and without values - the failure reporting is *much* better. when the scrape doesn't contain the required metric, we can now report the metric we wanted in a nicely formatted way, and show only a list of the "similar" metrics that were found. with the previous code, panics included the entire metrics scrape, which was very hard to read and debug. This should make it much easier to modify the tests to add new metric labels or add existing ones, since we no longer depend on specific hard-coded strings, but are instead making assertions about the properties we actually want to find. Example of the new errors: ``` thread 'tests::telemetry::response_classification::inbound_http' panicked at 'did not find a `response_total` metric with labels: ["authority=\"tele.test.svc.cluster.local\"", "direction=\"inbound\"", "tls=\"disabled\"", "status_code=\"200 OK\"", "classification=\"success\""] found metrics: [ "response_total{authority=\"tele.test.svc.cluster.local\",direction=\"inbound\",tls=\"disabled\",status_code=\"200\",classification=\"success\"} 1", "response_total{authority=\"tele.test.svc.cluster.local\",direction=\"inbound\",tls=\"disabled\",status_code=\"304\",classification=\"success\"} 1", ] retried 5 times (73.314652ms)', linkerd/app/integration/src/metrics.rs:128:51 stack backtrace: ... snip ... ``` These *could* probably be improved even more, for example, by highlighting the specific label that was missing. However, this felt like a pretty significant improvement for a first pass. In a follow-up, I'll also reduce some of the code duplication between the tests that are repeated for inbound and outbound. While working on this change, I found that modifying the tests was much harder since almost all of them were duplicated on the inbound and outbound halves of the proxy. I'll merge that in a separate PR to reduce the size of this diff, though.
hawkw
approved these changes
Jan 19, 2021
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jan 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jan 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
jijeesh
pushed a commit
to jijeesh/linkerd2
that referenced
this pull request
Mar 23, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh
pushed a commit
to jijeesh/linkerd2
that referenced
this pull request
Apr 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a superficial change that moves the
direction
label before theauthority
label on HTTP endpoint metrics.