-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: fix incorrect field format when logging in JSON #845
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When configuring the `tracing` subscriber to emit JSON logs, we're currently configuring it to use a JSON _event_ formatter, but not a JSON _field_ formatter. This means that span and event fields are formatted with the default field format, `key=val`, which is not valid JSON. This results in parse errors when logging events in a span with fields, and we format those fields as an error message instead. See linkerd/linkerd2#5520. Typically, the simplest way to configure a JSON `fmt` layer is to use the `.json()` method. This sets both the event formatter *and* the field formatter to emit JSON. However, since we are sharing some event formatter configuration across the JSON and plaintext logging modes, the proxy doesn't use this method; instead, it sets the JSON mode on the event formatter which has already been configured with the shared configuration, and sets it as the layer's event formatter *without* also setting the field format, so we continue using the default field format. This branch updates the proxy's tracing initialization code to also set the field format to JSON when the JSON logging mode is enabled. I've also added some comments to the JSON formatter configuration so we don't mess this up in the future. Before this change, trying to run the tests with the JSON log format gets a panic (malformed fields cause the JSON formatter to panic in debug mode, but it skips those fields in release mode so that it doesn't crash the app). For example: ```console $ LINKERD2_PROXY_LOG=linkerd=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture Finished test [unoptimized] target(s) in 0.10s Running target/debug/deps/linkerd_app_integration-d798ce4657a07303 running 1 test test tests::transparency::inbound_http1 ... thread 'main' panicked at 'span 'test_server' had malformed fields! this is a bug. error: expected value at line 1 column 1 fields: FormattedFields { fields: "version=Http1 addr=127.0.0.1:58307 test=main" }', /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.2.15/src/fmt/format/json.rs:162:49 stack backtrace: # ... snip ... ``` Now, the tests correctly output well-formed JSON: ```console $ LINKERD2_PROXY_LOG=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture Finished test [unoptimized] target(s) in 0.11s Running target/debug/deps/linkerd_app_integration-d798ce4657a07303 running 1 test test tests::transparency::inbound_http1 ... {"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"support server running"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]} {"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"listening!"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]} Http1 server running; addr=127.0.0.1:55561 support destination service listening; addr=127.0.0.1:51487 {"timestamp":"Jan 14 09:20:34.501","level":"DEBUG","fields":{"message":"building app"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]} {"timestamp":"Jan 14 09:20:34.502","level":"DEBUG","fields":{"message":"spawning daemon thread"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]} proxy running; tap=, identity=None, inbound=127.0.0.1:33045 (SO_ORIGINAL_DST=10.1.2.3:55561), outbound=127.0.0.1:45721, metrics=127.0.0.1:45373 {"timestamp":"Jan 14 09:20:34.502","level":"INFO","fields":{"listen.addr":"127.0.0.1:45721","ingress_mode":false},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"},{"name":"outbound"}]} # ... snip ... ```
olix0r
approved these changes
Jan 14, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jan 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
olix0r
added a commit
to linkerd/linkerd2
that referenced
this pull request
Jan 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes.
jijeesh
pushed a commit
to jijeesh/linkerd2
that referenced
this pull request
Mar 23, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh
pushed a commit
to jijeesh/linkerd2
that referenced
this pull request
Apr 21, 2021
This release improves diagnostics about the proxy's failfast state: * Warnings are now emitted when the failfast state is entered; * The "max concurrency exhausted" gRPC message has been changed to more-clearly indicate a failfast state error; and * Failfast recovery has been made more robust, ensuring that a service can recover indepenently of new requests being received. Furthermore, metric labeling has been improved: * TCP server metrics are now annotated with the original `target_addr`; * The `tls` label is now set to true for inbound TLS connections that lack a client ID. This is mostly helpful to clarify inbound metrics on the `identity` controller; * Outbound `tls` metrics could be reported incorrectly when a proxy was configured to not use identity. This has been corrected. Finally, socket-level errors now include a _client_ or _server_ prefix to indicate which side of the proxy encountered the error. --- * stack: remove `map_response` (linkerd/linkerd2-proxy#835) * replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842) * tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845) * replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839) * integration: improve tracing in tests (linkerd/linkerd2-proxy#846) * service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844) * inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843) * Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847) * Update tonic to v0.4 (linkerd/linkerd2-proxy#849) * failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848) * Update the base docker image (linkerd/linkerd2-proxy#850) * stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851) * Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858) * tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859) * Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853) * metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857) * metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856) * Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854) * Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852) * test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860) * tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855) * Update to tower v0.4.4 (linkerd/linkerd2-proxy#864) * Update cargo dependencies (linkerd/linkerd2-proxy#865) * metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861) * outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862) --- The opaque-ports test has been updated to reflect proxy metrics changes. Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When configuring the
tracing
subscriber to emit JSON logs, we'recurrently configuring it to use a JSON event formatter, but not a
JSON field formatter. This means that span and event fields are
formatted with the default field format,
key=val
, which is not validJSON. This results in parse errors when logging events in a span with
fields, and we format those fields as an error message instead. See
linkerd/linkerd2#5520.
Typically, the simplest way to configure a JSON
fmt
layer is to usethe
.json()
method. This sets both the event formatter and the fieldformatter to emit JSON. However, since we are sharing some event
formatter configuration across the JSON and plaintext logging modes, the
proxy doesn't use this method; instead, it sets the JSON mode on the
event formatter which has already been configured with the shared
configuration, and sets it as the layer's event formatter without also
setting the field format, so we continue using the default field format.
This branch updates the proxy's tracing initialization code to also set
the field format to JSON when the JSON logging mode is enabled. I've
also added some comments to the JSON formatter configuration so we don't
mess this up in the future.
Before this change, trying to run the tests with the JSON log format
gets a panic (malformed fields cause the JSON formatter to panic in
debug mode, but it skips those fields in release mode so that it doesn't
crash the app). For example:
Now, the tests correctly output well-formed JSON: