trace: Apache Common Log Format access logging #1319

hawkw · 2021-10-20T16:23:45Z

This branch builds on @tustvold's work in #601. The original PR
description from that branch:

Access logging is very important functionality for my team as we wish
to maintain feature-parity with our existing AWS ALB-based approach.
This functionality was requested
here and was marked
as help wanted, so thought I'd take a stab at implementing it.

Creating as a draft as still needs more testing and benchmarking, and
I'm new to tower and so might have made some rookie errors. However, I
wanted to create a draft as an opportunity to get some early feedback.

The basic design consists of an AccessLogLayer that instruments both
requests and responses that flow through it, in a similar manner to
how handle_time is already computed. I'm not a massive fan of this,
but it was the only way I could easily see to get accurate processing
time metrics. I've tried to avoid any memory allocation on the hot
path, although there are possibly more atomic increments than I would
like. The performance impact with the feature disabled, i.e.
LINKERD2_PROXY_ACCESS_LOG_FILE, not set should be minimal.

The results of this instrumentation are then sent over a mpsc channel
to an AccessLogCollector that writes them in a space-delimited format
to a file specified as an environment variable. It buffers in memory
and flushes on termination and on write if more than
FLUSH_TIMEOUT_SECS since the last flush. This makes the access logging
best effort much like AWS ALBs.

An example deployment scenario using this functionality might deploy a
fluent-bit sidecar to ship the logs, or write to /dev/stderr and use a
log shipper deployed as a DaemonSet.

The additional changes in this branch are:

Updated against the latest state of the main branch.
Changed the tracing configuration to use per-layer filtering, so that
the access log layer only sees access log spans, while the stdout
logging layer does not see the access log spans (although, it could
if we wanted it to...)
Changed the format for outputting the access log to the Apache Common
Log Format. Note that this format does not include all the data that
the access log spans currently collect; I excluded that data so that
the output is compatible with tools that ingest the Apache log format.
In a follow-up PR, we can add the ability to control what format the
access log is written in, and add an alternative format that includes
all the access log data that Linkerd's spans can collect (I suggest
newline-delimited JSON for this).

Of course, a huge thank you to @tustvold for all their work on this; I
only updated the branch with the latest changes and made some minor
improvements. :)

Co-authored-by: Raphael Taylor-Davies r.taylordavies@googlemail.com

Closes #601

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

The `Settings::build` method is fairly large and complex. This change splits into a few smaller units and reduces boilerplate--especially around reading the settings from the environment. This change is being made in service of #601 to help avoid making the build method even more complicated.

olix0r · 2021-11-03T18:28:38Z

linkerd/tracing/src/access_log.rs

+    registry::LookupSpan,
+};
+
+const ENV_ACCESS_LOG: &str = "LINKERD2_PROXY_ACCESS_LOG";


Stepping back from the code for a second: how is this actually supposed to be used in practice? How would a Linkerd user configure the proxy so that it had a volume to which access log files could be written?

I can understand emitting logs to stdout or stderr, but it's not clear to me how output to a file could actually be used.

Honestly, I have no idea; this is just what @tustvold's PR #601 did, and since that was opened by a potential user of this feature, I had kind of assumed they already knew how the logs were going to be collected, and that emitting the logs to a user configured file was the best way to integrate with whatever existing systems users are already using...

But, thinking through it...this would probably require adding some kind of volume mount to the proxy container, and because the proxy containers are added to pods by the proxy-injector after the user applies the configuration for a pod, this might be kind of hard to do? Perhaps it's worth getting additional user input about how people want to get access logs out of Linkerd...

I'm afraid my memory is a little bit hazy as to why I implemented it this way, but I think it was simply a case of replicating what Envoy does, combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to... There might have also been something about it being easier to write tests for, but I don't really recall I'm afraid.

For the reasons you articulate logging to a file in an environment making using of the admission webhook to inject the sidecars would be non-trivial, however, it could theoretically be used when doing injection manually. That being said I'm not aware of any environments doing this.

TLDR I don't think file output is needed per se, but I'm not sure it particularly hurts either?

combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to.

Ah, if the intention is that the user would just set the environment variable to /dev/stderr or /dev/stdout as appropriate, that should work. I guess I'd have to revise the follow-up PR #1350 to add a separate env variable for controlling the access log format, though --- if the user is going to set the env var to /dev/stderr, we obviously can't enable JSON access logging with a filename extension in that case! :)

jon-depop · 2022-01-04T10:46:58Z

New year bump 🎈

olix0r · 2022-01-04T16:18:36Z

@jon-depop While the code in this PR is in good shape, it is currently blocked on having a story on how users would interact with this feature. Specifically: where should these logs be written? It's not feasible for Linkerd to emit these logs to a file, as this would require volume mounts in all pods that emit access logs. We could alternatively intermingle these logs with the other proxy diagnostic logs, but I'm not sure how valuable that would actually be. If you are interested in helping to flesh this out, we should probably move that discussion into linkerd/linkerd2#1913

0x91 · 2022-01-05T15:36:28Z

@jon-depop While the code in this PR is in good shape, it is currently blocked on having a story on how users would interact with this feature. Specifically: where should these logs be written? It's not feasible for Linkerd to emit these logs to a file, as this would require volume mounts in all pods that emit access logs. We could alternatively intermingle these logs with the other proxy diagnostic logs, but I'm not sure how valuable that would actually be. If you are interested in helping to flesh this out, we should probably move that discussion into linkerd/linkerd2#1913

Intermingling them isn't perfect, but it would still yield useful information. I believe this is what the default nginx configuration in a container does too.

EDIT: it does seem onerous to require the scaffolding to write out an independent file. As a user I would prefer to split the output stream somewhere else if I was interested in the access logs by themselves, for example in a logging pipeline or a nice ripgrep command :)

jon-depop · 2022-01-06T10:44:39Z

As @0x91 said, we are currently intermingling the output as part of our POC (using our log processor to separate them into different streams). We would like to eventually have access logs to stderr and linkerd proxy diagnostics to stdout for more streamline log forwarding. We do not have any requirement for the logs to be written to a specific filepath.

Apologies for the delay in response @olix0r and many thanks for all the work so far @hawkw

jon-depop · 2022-01-14T16:14:23Z

thoughts @olix0r?

olix0r · 2022-01-14T18:02:58Z

@jon-depop I think that's a reasonable approach.

I'm not an expert on kubernetes' logging infrastructure, but as far as I know, it's not possible to decouple stdout and stderr (based on https://kubernetes.io/docs/concepts/cluster-administration/logging/ and kubernetes/kubernetes#28167); so I'm not sure that this distinction actually matters, practically. Am I wrong about this?

I only bring this up because unifying the output streams might simplify implementation (I'm not sure). If Kubernetes doesn't support demultiplexing the log streams, I'd opt for whatever is simplest code-wise.

I'll also note for the sake of completeness that this feature will need to be opt-in (via annotation). Per-request logging will definitely impact performance and logging costs, so we'll want to avoid this overhead by default.

jon-depop · 2022-01-17T11:59:44Z

From our current investigations, it seems we should be able to achieve this as the Kubelet supports splitting stdout and stderr log streams

We will be confirming this with a spike this week, will report as soon as we find out to allow you to make your preference in implementation.

I also 100% agree that this should be opt in, thanks for the heads up @olix0r.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

linkerd/tracing/src/access_log.rs

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

olix0r

I'm going to test this a little and then I think this is basically good to merge!

My only feedback asks to make the contract/overhead a bit clearer in comments.

olix0r · 2022-01-19T22:29:07Z

linkerd/http-access-log/src/lib.rs

+        let span = span!(target: TRACE_TARGET, Level::INFO, "http",
+            client.addr = %self.client_addr,
+            client.id = self.client_id.as_ref().map(|n| n.as_str()).unwrap_or("-"),
+            timestamp = %now(),
+            method = request.method().as_str(),
+            uri =  %request.uri(),
+            version = ?request.version(),
+            trace_id = trace_id(),
+            request_bytes = get_header(http::header::CONTENT_LENGTH),
+            status = field::Empty,
+            response_bytes = field::Empty,
+            total_ns = field::Empty,
+            processing_ns = field::Empty,
+            user_agent = get_header(http::header::USER_AGENT),
+            host = get_header(http::header::HOST),
+        );
+
+        if span.is_disabled() {


I think it's worth leaving a comment here making it clear that the span! body is only materialized when access logging is enabled by linkerd_tracing::access_log's initializer. I.e. this is expected to be free when tracing is not enabled.

Maybe also worth leaving a comment on the fn layer summarizing this, too.

hawkw · 2022-01-19T22:32:37Z

Great, I'll add some comments.

FWIW, you can't currently test the access logging functionality using the integration tests, as the linkerd-tracing::Settings::for_test constructor doesn't enable access logging. I can change that if we care!

olix0r · 2022-01-19T23:14:05Z

ran a load test (~20K RPS) with logging disabled and then enabled

about 5% more CPU when logging, and higher tail latencies.

olix0r · 2022-01-19T23:16:58Z

:; k logs -n ort deploy/server-000 -c linkerd-proxy  |head
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389555918Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388657397Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388702235Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388907033Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388918324Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388914521Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389215678Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389223877Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389228151Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389231928Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200

hawkw · 2022-01-20T00:11:12Z

hmm, tail latency bump is kinda bad...there's probably some things we can do about that.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

0x91 · 2022-01-20T23:42:53Z

thank you @olix0r and @hawkw for the work on this!

This release adds a new configuration variable, `LINKERD2_PROXY_ACCESS_LOG` that configures whether per-request access logs are emitted from inbound HTTP proxies. This variable may be set to `apache` for Apache Common Log output or `json` for JSON formatted logs. Access logs are emitted to stderr and the proxy's diagnostics logs are emitted to stdout. --- * build(deps): bump h2 from 0.3.9 to 0.3.10 (linkerd/linkerd2-proxy#1430) * Update Rust to v1.56.1 (linkerd/linkerd2-proxy#1431) * build(deps): bump indexmap from 1.7.0 to 1.8.0 (linkerd/linkerd2-proxy#1433) * Add a devcontainer config (linkerd/linkerd2-proxy#1432) * build(deps): bump serde_json from 1.0.74 to 1.0.75 (linkerd/linkerd2-proxy#1436) * build(deps): bump tracing-subscriber from 0.3.5 to 0.3.6 (linkerd/linkerd2-proxy#1435) * test: remove `env::set_var` call in test logging initialization (linkerd/linkerd2-proxy#1437) * trace: Apache Common Log Format access logging (linkerd/linkerd2-proxy#1319) * build(deps): bump libc from 0.2.112 to 0.2.113 (linkerd/linkerd2-proxy#1438) * build(deps): bump socket2 from 0.4.2 to 0.4.3 (linkerd/linkerd2-proxy#1439) * access-log: Add JSON support (linkerd/linkerd2-proxy#1350)

jon-depop · 2022-01-24T09:55:17Z

Many thanks @olix0r and @hawkw, also a really special thanks to @tustvold for starting this all in the first place! 🎉

tustvold and others added 30 commits January 28, 2021 14:06

Add access logging as requested in linkerd/linkerd2#1913

41fd82d

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

Merge branch 'main' into pr/tustvold/601

1b7a3cb

fixup pin-project dependency

acbe77e

fixup fuzzer cargo lock files

11d2076

review feedback

cd80bf4

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

misc style cleanup

d4c5337

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

enable/disable with env var

e9c5892

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

add client info to access log

6b5fb78

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

ensure the tracing filter enables access log spans

dce15dd

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

change access log spans to INFO

4d76632

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

review feedback

0990eb9

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

Merge remote-tracking branch 'upstream/main' into access-logging

126ffc2

load access log setting from environment

c46bf15

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

remove filtering from AccessLogWriter layer

b5de8d8

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

fix replay test compilation

d5fb34b

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

update fuzz lockfiles

93af5d9

Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>

Merge remote-tracking branch 'upstream/main' into access-logging

da3332f

mark inline; use CONTENT_LENGTH constant

777e0e5

clone golf

88bbd3b

clippay

892bc63

simplify env filter construction

f829918

malkovich

992647e

Merge branch 'ver/tracing-functions' into pr/tustvold/601

ebf7813

Move struct defintions to the top

c53ce37

Merge branch 'ver/tracing-functions' into pr/tustvold/601

72132a3

Reintroduce access logging with a TODO

4b56f3d

Merge branch 'main' into pr/tustvold/601

cf3440d

Split the access-log module and the log writer

af63421

Constify the special tracing target for access logs

b97ac8e

olix0r reviewed Nov 3, 2021

View reviewed changes

Merge branch 'main' into eliza/access-logging-2

b1ceafd

jon-depop mentioned this pull request Jan 6, 2022

Add apache style access logs linkerd/linkerd2#1913

Closed

hawkw added 2 commits January 18, 2022 14:33

Merge branch 'main' into eliza/access-logging-2

7bf594e

trace: write the access log to stderr

6985abe

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

olix0r reviewed Jan 19, 2022

View reviewed changes

linkerd/tracing/src/access_log.rs Outdated Show resolved Hide resolved

hawkw added 3 commits January 19, 2022 13:47

move env configuration out of access_log module

8f37ccc

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

rm dead code

74226c3

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

rm unneeded futures deps

52d6d0a

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

olix0r reviewed Jan 19, 2022

View reviewed changes

add some more comments

6c6a8a5

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

olix0r approved these changes Jan 20, 2022

View reviewed changes

olix0r merged commit f5e9cea into main Jan 20, 2022

olix0r deleted the eliza/access-logging-2 branch January 20, 2022 21:55

olix0r mentioned this pull request Jan 21, 2022

proxy: v2.171.0 linkerd/linkerd2#7661

Merged

hawkw mentioned this pull request Aug 23, 2022

add access logging docs linkerd/website#1438

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trace: Apache Common Log Format access logging #1319

trace: Apache Common Log Format access logging #1319

hawkw commented Oct 20, 2021

olix0r Nov 3, 2021

hawkw Nov 3, 2021

tustvold Nov 3, 2021

hawkw Nov 3, 2021

jon-depop commented Jan 4, 2022

olix0r commented Jan 4, 2022

0x91 commented Jan 5, 2022 •

edited

Loading

jon-depop commented Jan 6, 2022 •

edited

Loading

jon-depop commented Jan 14, 2022

olix0r commented Jan 14, 2022

jon-depop commented Jan 17, 2022

olix0r left a comment

olix0r Jan 19, 2022

hawkw commented Jan 19, 2022

olix0r commented Jan 19, 2022

olix0r commented Jan 19, 2022

hawkw commented Jan 20, 2022

0x91 commented Jan 20, 2022

jon-depop commented Jan 24, 2022

trace: Apache Common Log Format access logging #1319

trace: Apache Common Log Format access logging #1319

Conversation

hawkw commented Oct 20, 2021

olix0r Nov 3, 2021

Choose a reason for hiding this comment

hawkw Nov 3, 2021

Choose a reason for hiding this comment

tustvold Nov 3, 2021

Choose a reason for hiding this comment

hawkw Nov 3, 2021

Choose a reason for hiding this comment

jon-depop commented Jan 4, 2022

olix0r commented Jan 4, 2022

0x91 commented Jan 5, 2022 • edited Loading

jon-depop commented Jan 6, 2022 • edited Loading

jon-depop commented Jan 14, 2022

olix0r commented Jan 14, 2022

jon-depop commented Jan 17, 2022

olix0r left a comment

Choose a reason for hiding this comment

olix0r Jan 19, 2022

Choose a reason for hiding this comment

hawkw commented Jan 19, 2022

olix0r commented Jan 19, 2022

olix0r commented Jan 19, 2022

hawkw commented Jan 20, 2022

0x91 commented Jan 20, 2022

jon-depop commented Jan 24, 2022

0x91 commented Jan 5, 2022 •

edited

Loading

jon-depop commented Jan 6, 2022 •

edited

Loading