-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trace: Apache Common Log Format access logging #1319
Conversation
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Signed-off-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
The `Settings::build` method is fairly large and complex. This change splits into a few smaller units and reduces boilerplate--especially around reading the settings from the environment. This change is being made in service of #601 to help avoid making the build method even more complicated.
linkerd/tracing/src/access_log.rs
Outdated
registry::LookupSpan, | ||
}; | ||
|
||
const ENV_ACCESS_LOG: &str = "LINKERD2_PROXY_ACCESS_LOG"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back from the code for a second: how is this actually supposed to be used in practice? How would a Linkerd user configure the proxy so that it had a volume to which access log files could be written?
I can understand emitting logs to stdout or stderr, but it's not clear to me how output to a file could actually be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I have no idea; this is just what @tustvold's PR #601 did, and since that was opened by a potential user of this feature, I had kind of assumed they already knew how the logs were going to be collected, and that emitting the logs to a user configured file was the best way to integrate with whatever existing systems users are already using...
But, thinking through it...this would probably require adding some kind of volume mount to the proxy container, and because the proxy containers are added to pods by the proxy-injector after the user applies the configuration for a pod, this might be kind of hard to do? Perhaps it's worth getting additional user input about how people want to get access logs out of Linkerd...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid my memory is a little bit hazy as to why I implemented it this way, but I think it was simply a case of replicating what Envoy does, combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to... There might have also been something about it being easier to write tests for, but I don't really recall I'm afraid.
For the reasons you articulate logging to a file in an environment making using of the admission webhook to inject the sidecars would be non-trivial, however, it could theoretically be used when doing injection manually. That being said I'm not aware of any environments doing this.
TLDR I don't think file output is needed per se, but I'm not sure it particularly hurts either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combined with it being a simple way for the user to choose to be able to choose which of stdout or stderr to route logs to.
Ah, if the intention is that the user would just set the environment variable to /dev/stderr
or /dev/stdout
as appropriate, that should work. I guess I'd have to revise the follow-up PR #1350 to add a separate env variable for controlling the access log format, though --- if the user is going to set the env var to /dev/stderr
, we obviously can't enable JSON access logging with a filename extension in that case! :)
New year bump 🎈 |
@jon-depop While the code in this PR is in good shape, it is currently blocked on having a story on how users would interact with this feature. Specifically: where should these logs be written? It's not feasible for Linkerd to emit these logs to a file, as this would require volume mounts in all pods that emit access logs. We could alternatively intermingle these logs with the other proxy diagnostic logs, but I'm not sure how valuable that would actually be. If you are interested in helping to flesh this out, we should probably move that discussion into linkerd/linkerd2#1913 |
Intermingling them isn't perfect, but it would still yield useful information. I believe this is what the default nginx configuration in a container does too. EDIT: it does seem onerous to require the scaffolding to write out an independent file. As a user I would prefer to split the output stream somewhere else if I was interested in the access logs by themselves, for example in a logging pipeline or a nice ripgrep command :) |
As @0x91 said, we are currently intermingling the output as part of our POC (using our log processor to separate them into different streams). We would like to eventually have access logs to stderr and linkerd proxy diagnostics to stdout for more streamline log forwarding. We do not have any requirement for the logs to be written to a specific filepath. Apologies for the delay in response @olix0r and many thanks for all the work so far @hawkw |
thoughts @olix0r? |
@jon-depop I think that's a reasonable approach. I'm not an expert on kubernetes' logging infrastructure, but as far as I know, it's not possible to decouple stdout and stderr (based on https://kubernetes.io/docs/concepts/cluster-administration/logging/ and kubernetes/kubernetes#28167); so I'm not sure that this distinction actually matters, practically. Am I wrong about this? I only bring this up because unifying the output streams might simplify implementation (I'm not sure). If Kubernetes doesn't support demultiplexing the log streams, I'd opt for whatever is simplest code-wise. I'll also note for the sake of completeness that this feature will need to be opt-in (via annotation). Per-request logging will definitely impact performance and logging costs, so we'll want to avoid this overhead by default. |
From our current investigations, it seems we should be able to achieve this as the Kubelet supports splitting stdout and stderr log streams We will be confirming this with a spike this week, will report as soon as we find out to allow you to make your preference in implementation. I also 100% agree that this should be opt in, thanks for the heads up @olix0r. |
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to test this a little and then I think this is basically good to merge!
My only feedback asks to make the contract/overhead a bit clearer in comments.
linkerd/http-access-log/src/lib.rs
Outdated
let span = span!(target: TRACE_TARGET, Level::INFO, "http", | ||
client.addr = %self.client_addr, | ||
client.id = self.client_id.as_ref().map(|n| n.as_str()).unwrap_or("-"), | ||
timestamp = %now(), | ||
method = request.method().as_str(), | ||
uri = %request.uri(), | ||
version = ?request.version(), | ||
trace_id = trace_id(), | ||
request_bytes = get_header(http::header::CONTENT_LENGTH), | ||
status = field::Empty, | ||
response_bytes = field::Empty, | ||
total_ns = field::Empty, | ||
processing_ns = field::Empty, | ||
user_agent = get_header(http::header::USER_AGENT), | ||
host = get_header(http::header::HOST), | ||
); | ||
|
||
if span.is_disabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth leaving a comment here making it clear that the span!
body is only materialized when access logging is enabled by linkerd_tracing::access_log
's initializer. I.e. this is expected to be free when tracing is not enabled.
Maybe also worth leaving a comment on the fn layer
summarizing this, too.
Great, I'll add some comments. FWIW, you can't currently test the access logging functionality using the integration tests, as the |
|
hmm, tail latency bump is kinda bad...there's probably some things we can do about that. |
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This release adds a new configuration variable, `LINKERD2_PROXY_ACCESS_LOG` that configures whether per-request access logs are emitted from inbound HTTP proxies. This variable may be set to `apache` for Apache Common Log output or `json` for JSON formatted logs. Access logs are emitted to stderr and the proxy's diagnostics logs are emitted to stdout. --- * build(deps): bump h2 from 0.3.9 to 0.3.10 (linkerd/linkerd2-proxy#1430) * Update Rust to v1.56.1 (linkerd/linkerd2-proxy#1431) * build(deps): bump indexmap from 1.7.0 to 1.8.0 (linkerd/linkerd2-proxy#1433) * Add a devcontainer config (linkerd/linkerd2-proxy#1432) * build(deps): bump serde_json from 1.0.74 to 1.0.75 (linkerd/linkerd2-proxy#1436) * build(deps): bump tracing-subscriber from 0.3.5 to 0.3.6 (linkerd/linkerd2-proxy#1435) * test: remove `env::set_var` call in test logging initialization (linkerd/linkerd2-proxy#1437) * trace: Apache Common Log Format access logging (linkerd/linkerd2-proxy#1319) * build(deps): bump libc from 0.2.112 to 0.2.113 (linkerd/linkerd2-proxy#1438) * build(deps): bump socket2 from 0.4.2 to 0.4.3 (linkerd/linkerd2-proxy#1439) * access-log: Add JSON support (linkerd/linkerd2-proxy#1350)
This branch builds on @tustvold's work in #601. The original PR
description from that branch:
The additional changes in this branch are:
main
branch.tracing
configuration to use per-layer filtering, so thatthe access log layer only sees access log spans, while the stdout
logging layer does not see the access log spans (although, it could
if we wanted it to...)
Log Format. Note that this format does not include all the data that
the access log spans currently collect; I excluded that data so that
the output is compatible with tools that ingest the Apache log format.
In a follow-up PR, we can add the ability to control what format the
access log is written in, and add an alternative format that includes
all the access log data that Linkerd's spans can collect (I suggest
newline-delimited JSON for this).
Of course, a huge thank you to @tustvold for all their work on this; I
only updated the branch with the latest changes and made some minor
improvements. :)
Co-authored-by: Raphael Taylor-Davies r.taylordavies@googlemail.com
Closes #601