Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust client/server span times to account for clock skew #3930

Closed
rkargMsft opened this issue Jul 31, 2024 · 5 comments
Closed

Adjust client/server span times to account for clock skew #3930

rkargMsft opened this issue Jul 31, 2024 · 5 comments
Labels
stale Used for stale issues / PRs

Comments

@rkargMsft
Copy link

Is your feature request related to a problem? Please describe.
Clock skew across machines can cause spans to appear at the "wrong" time in relation to the actual execution.
For example, a client span calling out to a server span handling the request would expect to have the server span call exist wholly within the calling client span.

Describe the solution you'd like
One way to address this that Jaeger UI uses is to adjust the server span to fit within the calling client span to preserve causality when viewing the trace. This is optional and can be turned off to see the trace with the exact timestamps reported (this would match how Tempo currently displays things). The choice of how to adjust server spans is a bit arbitrary but centering it within the calling client span as Jaeger does is a reasonable way to visually maintain the causality between spans.
I believe that Jaeger UI also displays a visual indication that this skew adjustment was applied.

Describe alternatives you've considered
Ideally having times that are synchronized across participating machines would result in "good enough" timestamps to see what's happening, but this isn't always possible based on hosting: Cloud Provider limitations, multiple regions, multiple providers, Windows system time resolution, etc.

Additional context
Jaeger Clock Skew Adjustment

Jaeger backend combines trace data from applications that are usually running on different hosts. The hardware clocks on the hosts often experience relative drift, known as the clock skew effect  . Clock skew can make it difficult to reason about traces, for example, when a server span may appear to start earlier than the client span, which should not be possible. The query service implements a clock skew adjustment algorithm ( code  ) to correct for clock drift, using the knowledge about causal relationships between spans. All adjusted spans have a warning displayed in the UI that provides the exact clock skew delta applied to its timestamps.

In the below example, the green is a client span making a call to the orange server span. Because of clock skew between these machines (they're Windows with a default system time resolution of ~15 ms.) the reported times make it look like the server side happened after the calling span returned a result. The red line under the client span shows the server span (and it's children) being moved over to align with the middle of the client span.

Note that while this example has the server span recorded with a time period later than the calling span, it is also possible for a server span to have times that occur before the times of the calling span.

image

@joe-elliott
Copy link
Member

This is quite the controversial feature in Jaeger. It used to be default but then was made optional and eventually default was swapped to show the times as reported.

jaegertracing/jaeger#1459

I'd prefer returning the times as reported exactly for the same reasons Jaeger eventually went that route, but if some folks want to see the times adjusted this does not bother me.

I'd prefer to implement this in Grafana/visualization layer. wdyt?

@rkargMsft
Copy link
Author

rkargMsft commented Jul 31, 2024

Having it as an opt-in option would be fine. Maybe allow the default to be set in config (with the default config setting the default to no-clock-skew-adjustment to maintain current behavior)? Additionally, it would be nice to allow it to be toggled on the UI for an individual trace.
I also believe that the visual indication that an adjustment was made is a very good idea.

Documenting some things for people following along:

OTel makes a distinction between client/server and producer/consumer SpanKinds.
https://opentelemetry.io/docs/specs/otel/trace/api/#spankind

client/server indicates that the child span is expected (though not strictly required, see below) to complete within the time of the parent span and that the parent span represents waiting for a result from the child. This indicates that clock skew adjustment would likely be appropriate to match the provided expectation.

producer/consumer indicates that the child work is completed without the parent explicitly waiting for a response at the completion of the child span so the parent may finish before the child does, indicating that clock skew adjustment shouldn't be considered.

internal is in the same system so clock skew should hopefully not be an issue 🤞

In all cases, parent span start time must not be after the child span start time or causality is broken.

The one problematic case I saw is when a parent (client) calls to a child (server) and the client times out, thus ending, while the child is still processing. It's possible that the parent waited 30 sec. and the client could have kept processing the request. There really isn't a single "correct" way to handle this. As long as the child span starts after the parent, it's ambiguous as to what actually happened. If the child is longer than the parent span, then we know it at least ended after the parent span started, and if the child start is after the parent start, then leave things as they are. But if the child span is shorter then it's ambiguous as to whether the child span ended after the parent, but I'd argue it's far less likely that the child ended after the parent (given the client/server information) so if skew adjustment were opted in to, this seems like a valid set of adjustments:

  • [all] if parent start is after child start, move child start to parent start time
  • [client/server] if child end is after parent end, then either
    • if child duration is shorter than parent duration, shift child to have child end match parent end
    • if child duration is equal or longer than parent duration, shift child to have child start match parent start
    • could get super fancy and look at the parent span to see if there is an error status and use that to assume that some kind of timeout happened and do nothing. Slippery slope? dunno

The above seem reasonable given that it's impossible to come up with a "correct" timeline.

@joe-elliott
Copy link
Member

It's my opinion that this should be handled in the frontend. Tempo would ideally always return exactly what is sent to it and various visualizations could make whatever choices they want when rendering the data.

What do you think about creating a new issue in grafana/grafana and linking this one?

@rkargMsft
Copy link
Author

Sorry, missed the notification about your comment. UI-side option makes sense.
I'll get that issue created.

Copy link
Contributor

github-actions bot commented Nov 6, 2024

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Nov 6, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Used for stale issues / PRs
Projects
None yet
Development

No branches or pull requests

2 participants