-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust client/server span times to account for clock skew #3930
Comments
This is quite the controversial feature in Jaeger. It used to be default but then was made optional and eventually default was swapped to show the times as reported. I'd prefer returning the times as reported exactly for the same reasons Jaeger eventually went that route, but if some folks want to see the times adjusted this does not bother me. I'd prefer to implement this in Grafana/visualization layer. wdyt? |
Having it as an opt-in option would be fine. Maybe allow the default to be set in config (with the default config setting the default to no-clock-skew-adjustment to maintain current behavior)? Additionally, it would be nice to allow it to be toggled on the UI for an individual trace. Documenting some things for people following along: OTel makes a distinction between
In all cases, parent span start time must not be after the child span start time or causality is broken. The one problematic case I saw is when a parent (
The above seem reasonable given that it's impossible to come up with a "correct" timeline. |
It's my opinion that this should be handled in the frontend. Tempo would ideally always return exactly what is sent to it and various visualizations could make whatever choices they want when rendering the data. What do you think about creating a new issue in |
Sorry, missed the notification about your comment. UI-side option makes sense. |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. |
Is your feature request related to a problem? Please describe.
Clock skew across machines can cause spans to appear at the "wrong" time in relation to the actual execution.
For example, a
client
span calling out to aserver
span handling the request would expect to have theserver
span call exist wholly within the callingclient
span.Describe the solution you'd like
One way to address this that Jaeger UI uses is to adjust the
server
span to fit within the callingclient
span to preserve causality when viewing the trace. This is optional and can be turned off to see the trace with the exact timestamps reported (this would match how Tempo currently displays things). The choice of how to adjustserver
spans is a bit arbitrary but centering it within the callingclient
span as Jaeger does is a reasonable way to visually maintain the causality between spans.I believe that Jaeger UI also displays a visual indication that this skew adjustment was applied.
Describe alternatives you've considered
Ideally having times that are synchronized across participating machines would result in "good enough" timestamps to see what's happening, but this isn't always possible based on hosting: Cloud Provider limitations, multiple regions, multiple providers, Windows system time resolution, etc.
Additional context
Jaeger Clock Skew Adjustment
In the below example, the green is a
client
span making a call to the orangeserver
span. Because of clock skew between these machines (they're Windows with a default system time resolution of ~15 ms.) the reported times make it look like the server side happened after the calling span returned a result. The red line under theclient
span shows theserver
span (and it's children) being moved over to align with the middle of theclient
span.Note that while this example has the
server
span recorded with a time period later than the calling span, it is also possible for aserver
span to have times that occur before the times of the calling span.The text was updated successfully, but these errors were encountered: