Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace: Apache Common Log Format access logging #1319

Merged
merged 64 commits into from
Jan 20, 2022
Merged

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Oct 20, 2021

This branch builds on @tustvold's work in #601. The original PR
description from that branch:

Access logging is very important functionality for my team as we wish
to maintain feature-parity with our existing AWS ALB-based approach.
This functionality was requested
here and was marked
as help wanted, so thought I'd take a stab at implementing it.

Creating as a draft as still needs more testing and benchmarking, and
I'm new to tower and so might have made some rookie errors. However, I
wanted to create a draft as an opportunity to get some early feedback.

The basic design consists of an AccessLogLayer that instruments both
requests and responses that flow through it, in a similar manner to
how handle_time is already computed. I'm not a massive fan of this,
but it was the only way I could easily see to get accurate processing
time metrics. I've tried to avoid any memory allocation on the hot
path, although there are possibly more atomic increments than I would
like. The performance impact with the feature disabled, i.e.
LINKERD2_PROXY_ACCESS_LOG_FILE, not set should be minimal.

The results of this instrumentation are then sent over a mpsc channel
to an AccessLogCollector that writes them in a space-delimited format
to a file specified as an environment variable. It buffers in memory
and flushes on termination and on write if more than
FLUSH_TIMEOUT_SECS since the last flush. This makes the access logging
best effort much like AWS ALBs.

An example deployment scenario using this functionality might deploy a
fluent-bit sidecar to ship the logs, or write to /dev/stderr and use a
log shipper deployed as a DaemonSet.

The additional changes in this branch are:

  • Updated against the latest state of the main branch.
  • Changed the tracing configuration to use per-layer filtering, so that
    the access log layer only sees access log spans, while the stdout
    logging layer does not see the access log spans (although, it could
    if we wanted it to...)
  • Changed the format for outputting the access log to the Apache Common
    Log Format. Note that this format does not include all the data that
    the access log spans currently collect; I excluded that data so that
    the output is compatible with tools that ingest the Apache log format.
    In a follow-up PR, we can add the ability to control what format the
    access log is written in, and add an alternative format that includes
    all the access log data that Linkerd's spans can collect (I suggest
    newline-delimited JSON for this).

Of course, a huge thank you to @tustvold for all their work on this; I
only updated the branch with the latest changes and made some minor
improvements. :)

Co-authored-by: Raphael Taylor-Davies r.taylordavies@googlemail.com

Closes #601

tustvold and others added 30 commits January 28, 2021 14:06
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
The `Settings::build` method is fairly large and complex. This change
splits into a few smaller units and reduces boilerplate--especially
around reading the settings from the environment.

This change is being made in service of #601 to help avoid making the
build method even more complicated.
registry::LookupSpan,
};

const ENV_ACCESS_LOG: &str = "LINKERD2_PROXY_ACCESS_LOG";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back from the code for a second: how is this actually supposed to be used in practice? How would a Linkerd user configure the proxy so that it had a volume to which access log files could be written?

I can understand emitting logs to stdout or stderr, but it's not clear to me how output to a file could actually be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no idea; this is just what @tustvold's PR #601 did, and since that was opened by a potential user of this feature, I had kind of assumed they already knew how the logs were going to be collected, and that emitting the logs to a user configured file was the best way to integrate with whatever existing systems users are already using...

But, thinking through it...this would probably require adding some kind of volume mount to the proxy container, and because the proxy containers are added to pods by the proxy-injector after the user applies the configuration for a pod, this might be kind of hard to do? Perhaps it's worth getting additional user input about how people want to get access logs out of Linkerd...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid my memory is a little bit hazy as to why I implemented it this way, but I think it was simply a case of replicating what Envoy does, combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to... There might have also been something about it being easier to write tests for, but I don't really recall I'm afraid.

For the reasons you articulate logging to a file in an environment making using of the admission webhook to inject the sidecars would be non-trivial, however, it could theoretically be used when doing injection manually. That being said I'm not aware of any environments doing this.

TLDR I don't think file output is needed per se, but I'm not sure it particularly hurts either?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to.

Ah, if the intention is that the user would just set the environment variable to /dev/stderr or /dev/stdout as appropriate, that should work. I guess I'd have to revise the follow-up PR #1350 to add a separate env variable for controlling the access log format, though --- if the user is going to set the env var to /dev/stderr, we obviously can't enable JSON access logging with a filename extension in that case! :)

@jon-depop
Copy link

New year bump 🎈

@olix0r
Copy link
Member

olix0r commented Jan 4, 2022

@jon-depop While the code in this PR is in good shape, it is currently blocked on having a story on how users would interact with this feature. Specifically: where should these logs be written? It's not feasible for Linkerd to emit these logs to a file, as this would require volume mounts in all pods that emit access logs. We could alternatively intermingle these logs with the other proxy diagnostic logs, but I'm not sure how valuable that would actually be. If you are interested in helping to flesh this out, we should probably move that discussion into linkerd/linkerd2#1913

@0x91
Copy link

0x91 commented Jan 5, 2022

@jon-depop While the code in this PR is in good shape, it is currently blocked on having a story on how users would interact with this feature. Specifically: where should these logs be written? It's not feasible for Linkerd to emit these logs to a file, as this would require volume mounts in all pods that emit access logs. We could alternatively intermingle these logs with the other proxy diagnostic logs, but I'm not sure how valuable that would actually be. If you are interested in helping to flesh this out, we should probably move that discussion into linkerd/linkerd2#1913

Intermingling them isn't perfect, but it would still yield useful information. I believe this is what the default nginx configuration in a container does too.

EDIT: it does seem onerous to require the scaffolding to write out an independent file. As a user I would prefer to split the output stream somewhere else if I was interested in the access logs by themselves, for example in a logging pipeline or a nice ripgrep command :)

@jon-depop
Copy link

jon-depop commented Jan 6, 2022

As @0x91 said, we are currently intermingling the output as part of our POC (using our log processor to separate them into different streams). We would like to eventually have access logs to stderr and linkerd proxy diagnostics to stdout for more streamline log forwarding. We do not have any requirement for the logs to be written to a specific filepath.

Apologies for the delay in response @olix0r and many thanks for all the work so far @hawkw

@jon-depop
Copy link

thoughts @olix0r?

@olix0r
Copy link
Member

olix0r commented Jan 14, 2022

@jon-depop I think that's a reasonable approach.

I'm not an expert on kubernetes' logging infrastructure, but as far as I know, it's not possible to decouple stdout and stderr (based on https://kubernetes.io/docs/concepts/cluster-administration/logging/ and kubernetes/kubernetes#28167); so I'm not sure that this distinction actually matters, practically. Am I wrong about this?

I only bring this up because unifying the output streams might simplify implementation (I'm not sure). If Kubernetes doesn't support demultiplexing the log streams, I'd opt for whatever is simplest code-wise.

I'll also note for the sake of completeness that this feature will need to be opt-in (via annotation). Per-request logging will definitely impact performance and logging costs, so we'll want to avoid this overhead by default.

@jon-depop
Copy link

From our current investigations, it seems we should be able to achieve this as the Kubelet supports splitting stdout and stderr log streams

We will be confirming this with a spike this week, will report as soon as we find out to allow you to make your preference in implementation.

I also 100% agree that this should be opt in, thanks for the heads up @olix0r.

hawkw added 3 commits January 19, 2022 13:47
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to test this a little and then I think this is basically good to merge!

My only feedback asks to make the contract/overhead a bit clearer in comments.

Comment on lines 108 to 125
let span = span!(target: TRACE_TARGET, Level::INFO, "http",
client.addr = %self.client_addr,
client.id = self.client_id.as_ref().map(|n| n.as_str()).unwrap_or("-"),
timestamp = %now(),
method = request.method().as_str(),
uri = %request.uri(),
version = ?request.version(),
trace_id = trace_id(),
request_bytes = get_header(http::header::CONTENT_LENGTH),
status = field::Empty,
response_bytes = field::Empty,
total_ns = field::Empty,
processing_ns = field::Empty,
user_agent = get_header(http::header::USER_AGENT),
host = get_header(http::header::HOST),
);

if span.is_disabled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth leaving a comment here making it clear that the span! body is only materialized when access logging is enabled by linkerd_tracing::access_log's initializer. I.e. this is expected to be free when tracing is not enabled.

Maybe also worth leaving a comment on the fn layer summarizing this, too.

@hawkw
Copy link
Contributor Author

hawkw commented Jan 19, 2022

Great, I'll add some comments.

FWIW, you can't currently test the access logging functionality using the integration tests, as the linkerd-tracing::Settings::for_test constructor doesn't enable access logging. I can change that if we care!

@olix0r
Copy link
Member

olix0r commented Jan 19, 2022

ran a load test (~20K RPS) with logging disabled and then enabled

image

about 5% more CPU when logging, and higher tail latencies.

@olix0r
Copy link
Member

olix0r commented Jan 19, 2022

:; k logs -n ort deploy/server-000 -c linkerd-proxy  |head
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389555918Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388657397Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388702235Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388907033Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388918324Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.388914521Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389215678Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389223877Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389228151Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200
10.42.0.31:46616 load-http.ort.serviceaccount.identity.linkerd.cluster.local - [2022-01-19T23:15:55.389231928Z] "GET http://logical:8080/?latency_ms=0&size=0 HTTP/2.0" 200

@hawkw
Copy link
Contributor Author

hawkw commented Jan 20, 2022

hmm, tail latency bump is kinda bad...there's probably some things we can do about that.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@olix0r olix0r merged commit f5e9cea into main Jan 20, 2022
@olix0r olix0r deleted the eliza/access-logging-2 branch January 20, 2022 21:55
@0x91
Copy link

0x91 commented Jan 20, 2022

thank you @olix0r and @hawkw for the work on this!

olix0r added a commit to linkerd/linkerd2 that referenced this pull request Jan 21, 2022
This release adds a new configuration variable,
`LINKERD2_PROXY_ACCESS_LOG` that configures whether per-request access
logs are emitted from inbound HTTP proxies. This variable may be set to
`apache` for Apache Common Log output or `json` for JSON formatted logs.
Access logs are emitted to stderr and the proxy's diagnostics logs are
emitted to stdout.

---

* build(deps): bump h2 from 0.3.9 to 0.3.10 (linkerd/linkerd2-proxy#1430)
* Update Rust to v1.56.1 (linkerd/linkerd2-proxy#1431)
* build(deps): bump indexmap from 1.7.0 to 1.8.0 (linkerd/linkerd2-proxy#1433)
* Add a devcontainer config (linkerd/linkerd2-proxy#1432)
* build(deps): bump serde_json from 1.0.74 to 1.0.75 (linkerd/linkerd2-proxy#1436)
* build(deps): bump tracing-subscriber from 0.3.5 to 0.3.6 (linkerd/linkerd2-proxy#1435)
* test: remove `env::set_var` call in test logging initialization  (linkerd/linkerd2-proxy#1437)
* trace: Apache Common Log Format access logging (linkerd/linkerd2-proxy#1319)
* build(deps): bump libc from 0.2.112 to 0.2.113 (linkerd/linkerd2-proxy#1438)
* build(deps): bump socket2 from 0.4.2 to 0.4.3 (linkerd/linkerd2-proxy#1439)
* access-log: Add JSON support (linkerd/linkerd2-proxy#1350)
@jon-depop
Copy link

Many thanks @olix0r and @hawkw, also a really special thanks to @tustvold for starting this all in the first place! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants