Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: fix incorrect field format when logging in JSON #845

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Jan 14, 2021

When configuring the tracing subscriber to emit JSON logs, we're
currently configuring it to use a JSON event formatter, but not a
JSON field formatter. This means that span and event fields are
formatted with the default field format, key=val, which is not valid
JSON. This results in parse errors when logging events in a span with
fields, and we format those fields as an error message instead. See
linkerd/linkerd2#5520.

Typically, the simplest way to configure a JSON fmt layer is to use
the .json() method. This sets both the event formatter and the field
formatter to emit JSON. However, since we are sharing some event
formatter configuration across the JSON and plaintext logging modes, the
proxy doesn't use this method; instead, it sets the JSON mode on the
event formatter which has already been configured with the shared
configuration, and sets it as the layer's event formatter without also
setting the field format, so we continue using the default field format.

This branch updates the proxy's tracing initialization code to also set
the field format to JSON when the JSON logging mode is enabled. I've
also added some comments to the JSON formatter configuration so we don't
mess this up in the future.

Before this change, trying to run the tests with the JSON log format
gets a panic (malformed fields cause the JSON formatter to panic in
debug mode, but it skips those fields in release mode so that it doesn't
crash the app). For example:

$ LINKERD2_PROXY_LOG=linkerd=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture
    Finished test [unoptimized] target(s) in 0.10s
     Running target/debug/deps/linkerd_app_integration-d798ce4657a07303

running 1 test
test tests::transparency::inbound_http1 ... thread 'main' panicked at 'span 'test_server' had malformed fields! this is a bug.
  error: expected value at line 1 column 1
  fields: FormattedFields { fields: "version=Http1 addr=127.0.0.1:58307 test=main" }', /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.2.15/src/fmt/format/json.rs:162:49
stack backtrace:
    # ... snip ...

Now, the tests correctly output well-formed JSON:

$ LINKERD2_PROXY_LOG=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture
    Finished test [unoptimized] target(s) in 0.11s
     Running target/debug/deps/linkerd_app_integration-d798ce4657a07303

running 1 test
test tests::transparency::inbound_http1 ... {"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"support server running"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]}
{"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"listening!"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]}
Http1 server running; addr=127.0.0.1:55561
support destination service listening; addr=127.0.0.1:51487
{"timestamp":"Jan 14 09:20:34.501","level":"DEBUG","fields":{"message":"building app"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]}
{"timestamp":"Jan 14 09:20:34.502","level":"DEBUG","fields":{"message":"spawning daemon thread"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]}
proxy running; tap=, identity=None, inbound=127.0.0.1:33045 (SO_ORIGINAL_DST=10.1.2.3:55561), outbound=127.0.0.1:45721, metrics=127.0.0.1:45373
{"timestamp":"Jan 14 09:20:34.502","level":"INFO","fields":{"listen.addr":"127.0.0.1:45721","ingress_mode":false},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"},{"name":"outbound"}]}
    # ... snip ...

When configuring the `tracing` subscriber to emit JSON logs, we're
currently configuring it to use a JSON _event_ formatter, but not a
JSON _field_ formatter. This means that span and event fields are
formatted with the default field format, `key=val`, which is not valid
JSON. This results in parse errors when logging events in a span with
fields, and we format those fields as an error message instead. See
linkerd/linkerd2#5520.

Typically, the simplest way to configure a JSON `fmt` layer is to use
the `.json()` method. This sets both the event formatter *and* the field
formatter to emit JSON. However, since we are sharing some event
formatter configuration across the JSON and plaintext logging modes, the
proxy doesn't use this method; instead, it sets the JSON mode on the
event formatter which has already been configured with the shared
configuration, and sets it as the layer's event formatter *without* also
setting the field format, so we continue using the default field format.

This branch updates the proxy's tracing initialization code to also set
the field format to JSON when the JSON logging mode is enabled. I've
also added some comments to the JSON formatter configuration so we don't
mess this up in the future.

Before this change, trying to run the tests with the JSON log format
gets a panic (malformed fields cause the JSON formatter to panic in
debug mode, but it skips those fields in release mode so that it doesn't
crash the app). For example:

```console
$ LINKERD2_PROXY_LOG=linkerd=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture
    Finished test [unoptimized] target(s) in 0.10s
     Running target/debug/deps/linkerd_app_integration-d798ce4657a07303

running 1 test
test tests::transparency::inbound_http1 ... thread 'main' panicked at 'span 'test_server' had malformed fields! this is a bug.
  error: expected value at line 1 column 1
  fields: FormattedFields { fields: "version=Http1 addr=127.0.0.1:58307 test=main" }', /home/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-subscriber-0.2.15/src/fmt/format/json.rs:162:49
stack backtrace:
    # ... snip ...
```

Now, the tests correctly output well-formed JSON:

```console
$ LINKERD2_PROXY_LOG=debug LINKERD2_PROXY_LOG_FORMAT=json cargo test -p linkerd-app-integration -- transparency::inbound_http1 --test-threads=1 --nocapture
    Finished test [unoptimized] target(s) in 0.11s
     Running target/debug/deps/linkerd_app_integration-d798ce4657a07303

running 1 test
test tests::transparency::inbound_http1 ... {"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"support server running"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]}
{"timestamp":"Jan 14 09:20:34.499","level":"INFO","fields":{"message":"listening!"},"target":"linkerd_app_integration::server","spans":[{"addr":"127.0.0.1:55561","test":"main","version":"Http1","name":"test_server"}]}
Http1 server running; addr=127.0.0.1:55561
support destination service listening; addr=127.0.0.1:51487
{"timestamp":"Jan 14 09:20:34.501","level":"DEBUG","fields":{"message":"building app"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]}
{"timestamp":"Jan 14 09:20:34.502","level":"DEBUG","fields":{"message":"spawning daemon thread"},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"}]}
proxy running; tap=, identity=None, inbound=127.0.0.1:33045 (SO_ORIGINAL_DST=10.1.2.3:55561), outbound=127.0.0.1:45721, metrics=127.0.0.1:45373
{"timestamp":"Jan 14 09:20:34.502","level":"INFO","fields":{"listen.addr":"127.0.0.1:45721","ingress_mode":false},"target":"linkerd_app","spans":[{"test":"main:proxy","name":"proxy"},{"name":"outbound"}]}
    # ... snip ...
```
@hawkw hawkw requested a review from a team January 14, 2021 17:29
@hawkw hawkw self-assigned this Jan 14, 2021
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hawkw hawkw merged commit c39f51d into main Jan 14, 2021
@hawkw hawkw deleted the eliza/fix-json-logs branch January 14, 2021 18:08
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jan 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jan 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Mar 23, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Apr 21, 2021
This release improves diagnostics about the proxy's failfast state:

* Warnings are now emitted when the failfast state is entered;
* The "max concurrency exhausted" gRPC message has been changed to
  more-clearly indicate a failfast state error; and
* Failfast recovery has been made more robust, ensuring that a service
  can recover indepenently of new requests being received.

Furthermore, metric labeling has been improved:

* TCP server metrics are now annotated with the original `target_addr`;
* The `tls` label is now set to true for inbound TLS connections that
  lack a client ID. This is mostly helpful to clarify inbound metrics on
  the `identity` controller;
* Outbound `tls` metrics could be reported incorrectly when a proxy was
  configured to not use identity. This has been corrected.

Finally, socket-level errors now include a _client_ or _server_ prefix
to indicate which side of the proxy encountered the error.

---

* stack: remove `map_response` (linkerd/linkerd2-proxy#835)
* replace `RequestFilter` with Tower's upstream impl (linkerd/linkerd2-proxy#842)
* tracing: fix incorrect field format when logging in JSON (linkerd/linkerd2-proxy#845)
* replace `FutureService` with Tower's upstream impl (linkerd/linkerd2-proxy#839)
* integration: improve tracing in tests (linkerd/linkerd2-proxy#846)
* service-profiles: Prevent Duration coercion panics (linkerd/linkerd2-proxy#844)
* inbound: Separate HTTP server logic from protocol detection (linkerd/linkerd2-proxy#843)
* Correct gRPC 'max-concurrency exhausted' error messages (linkerd/linkerd2-proxy#847)
* Update tonic to v0.4 (linkerd/linkerd2-proxy#849)
* failfast: Improve diagnostic logging (linkerd/linkerd2-proxy#848)
* Update the base docker image (linkerd/linkerd2-proxy#850)
* stack: Implement Clone for ResultService (linkerd/linkerd2-proxy#851)
* Ensure services in failfast can become ready (linkerd/linkerd2-proxy#858)
* tests: replace string matching on metrics with parsing (linkerd/linkerd2-proxy#859)
* Decouple tls::accept from TcpStream (linkerd/linkerd2-proxy#853)
* metrics: Handle NoPeerIdFromRemote properly (linkerd/linkerd2-proxy#857)
* metrics: Reorder metrics labels (linkerd/linkerd2-proxy#856)
* Rename tls::accept to tls::server (linkerd/linkerd2-proxy#854)
* Annotate socket-level errors with a scope (linkerd/linkerd2-proxy#852)
* test: reduce repetition in metrics tests (linkerd/linkerd2-proxy#860)
* tls: Disambiguate client and server identities (linkerd/linkerd2-proxy#855)
* Update to tower v0.4.4 (linkerd/linkerd2-proxy#864)
* Update cargo dependencies (linkerd/linkerd2-proxy#865)
* metrics: add `target_addr` label for accepted transport metrics (linkerd/linkerd2-proxy#861)
* outbound: Strip endpoint identity when disabled (linkerd/linkerd2-proxy#862)

---

The opaque-ports test has been updated to reflect proxy metrics changes.

Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants