-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request log tracing #206
Request log tracing #206
Conversation
Generate changelog in
|
At the moment, we send trace ids and span ids with our requests. This is excellent, and provides some coarse grained information. That said, it's very limited - you can merely know that some number of requests were triggered by the same underlying operation. Here, when a trace is originated with a parent span, we keep track of that parent (called an 'originating' span id) and also send it with requests. This means that we can use request logs to piece together accurate service level tracing as well as trace logs. In other words, if I have a call graph which looks like: ```yml service a/foo: check auth: call auth service: auth service/check auth: do work: call work service: work service/do operation: call auth service: auth service/check auth: ``` then I will see that in the trace logs (sampled) but I can derive ```yml service a/foo: auth service/check auth: work service/do operation: auth service/check auth: ``` from the request logs on all requests. Currently this is possible when requests are single-threaded, but not when people call services in parallel.
f0a9029
to
72a26f0
Compare
* information to reconstruct a request-level trace. For service-internal tracing, the typical trace logs | ||
* (with sampling) are still required. | ||
*/ | ||
String ORIGINATING_SPAN_ID = "X-OrigSpanId"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also make sure this header name is mentioned in the README so it's vaguely googleable? This is an extra piece of information that we've made up that's not in ZipKin, so would be good to make sure it's super obvious!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing the naming discussion - "incoming span id" doesn't quite make sense for the first time this gets emitted as a header, because the first value sent was just made up inside the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to resemble a header? If we have the ability to push non-span parameters through all of the glue, it'd be nice if it were instead something like _rootSpan
or _origSpanId
to match with the other "special" parameters that we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is actually sent over the wire as a header, if that makes a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha - I was thinking this was just tracked separately on each server.
@sfackler for SA just in case you wanted to do something similar in https://github.com/palantir/rust-zipkin? |
tracing-api/src/main/java/com/palantir/tracing/api/TraceHttpHeaders.java
Outdated
Show resolved
Hide resolved
tracing-api/src/main/java/com/palantir/tracing/api/TraceHttpHeaders.java
Outdated
Show resolved
Hide resolved
Nice! I'd been thinking about doing something like this as well. |
tracing-api/src/main/java/com/palantir/tracing/api/TraceHttpHeaders.java
Outdated
Show resolved
Hide resolved
tracing-okhttp3/src/main/java/com/palantir/tracing/okhttp3/OkhttpTraceInterceptor.java
Show resolved
Hide resolved
No real opposition here, but have we also considered just logging all trace logs that are associated with outbound or incoming requests? That would be O(request logs) that we'd end up logging and ingesting, but would allow us to only rely on the trace log format for tracing analysis. |
I guess you still need to plumb this through internally, but think you can get away without pushing it across the wire in that case. |
Considered that, but:
|
I think 1 and 2 are handled just by saying the |
I'm maybe not understanding what you're looking for, but isn't that lossy? - like repurposing the parent span can't help but break other things, right? |
All I'm saying is that any of the internal spans generated get removed and treated like they don't exist. So in a service that has a span stack like:
if the trace isn't supposed to be logged we could instead just log:
notice the Because we're only logging the spans associated with remote calls, this ends up producing the same order of magnitude of logs as requests logs do. This really only seems nicer to me because we don't need to tweak any information we're sending and the existing tracing infra would continue to work with these logs (though we probably want to signal that the trace was downsampled). |
My concerns with that approach are roughly:
|
Released 3.1.0 |
The approach Joe describes basically already functions today because Carter fixed tracing to not create new spans for unsampled traces. If request lots simply contained the parentSpanId that'd let you reconstruct traces fully without also logging traces and/or propagating more headers on every request. |
At the moment, we send trace ids and span ids with our requests. This is
excellent, and provides some coarse grained information. That said, it's
very limited - you can merely know that some number of requests were
triggered by the same underlying operation.
Here, when a trace is originated with a parent span, we keep track of
that parent (called an 'originating' span id) and also send it with
requests. This means that we can use request logs to piece together
accurate service level tracing as well as trace logs.
In other words, if I have a call graph which looks like:
then I will see that in the trace logs (provided the request is sampled) but I can derive
from the request logs on all requests. Currently this is possible
when requests are single-threaded, but not when people call services in
parallel.
The possible downsides are extra network overhead by communicating a new header over the wire. However, this is likely to be very small (28 bytes per request), and if there is a problem with this we can always stop communicating the parent span id, which is almost surely useless as a request header.