Side effects of imprecise clock #192

mabn · 2017-05-31T23:45:36Z

I found an interesting problem that took some time to track down.

Basically spans in the jaeger UI were "centered" in a weird way. Initially I thought that it's some UI issue, but after comparing the json returned by jaeger-query with cassandra it turned out that jaeger-query completely messes span start times. And by completely I mean that span taking 1ms was shifted by 3ms. Here's how it looks like:

In this example "child 2" should happen as the last span inside the "parent" - should be stuck to the right edge of the parent. There's no delay because all those spans are reported by the same process.

After digging through the jaeger code I found ClockSkew and there are two problems here:

ClockSkew adjust span start times if they overlap with their parent. It does it only if parent and child are reported by different hosts, but the host is taken from "ip" tag of the process. If it's missing the default is to assume that these are different hosts.
jaeger-client-java by default uses SystemClock which has millisecond precision, but timestamps reported to jaeger are in microseconds. To convert the time to micros it multiplies it by 1000:
```
public long currentTimeMicros() {  return System.currentTimeMillis() * 1000; }
```
One can think of it as rounding down exact start time to milliseconds.

Span duration on the other hand is calculated with more precise System.nanoTime.
As a result of this rounding it's possible that:
```
round(parent.start) + parent.duration < round(child.start) + child.duration
```
for children which end at the same time as the parent it happens very often.

So - child span end time exceeds parent's finish time by between 0 and 1000 micros which in combination with missing process "ip" tag triggers clock skew adjuster. And for very long parents and very short children the adjustment is drastic (around 50% of parent duration).

There are two options:

change the default in jaeger-query
better clock implementation in jaeger-client-java
add "ip" process tag by default

I guess that the default make sense so better clock implementation would be a good improvement.

The text was updated successfully, but these errors were encountered:

yurishkuro · 2017-06-01T00:00:03Z

Nice catch.

Are you using java client that reports jaeger-thrift model (recently released)? It should not have the problem with "ip" tag.

The behavior of clock skew adjustment aside, the real issue seems to be the difference in precision between startTime and duration. Unfortunately, I don't know of a good way to get accurate startTime precision. The System.nanoSeconds() we're using for duration is not an offset from epoch, but from some random point in time, so not suitable for startTime.

mabn · 2017-06-01T01:02:09Z

Yes, I use the thrift Sender - com.uber.jaeger.senders.UdpSender. But it doesn't add "ip" tag. It probably should be added inside Tracer constructor - ip is obtained there, but added to the tags.

As for the clock - I'm not sure. I'd consider keeping offset between currentTimeMillis and nanoTime and use nanoTime + offset, but the offset is not fixed - ntpd adjusts the clock, there might be leap seconds. So it should be updated from time to time... It might be tricky to get it right.

yurishkuro · 2017-06-01T01:21:12Z

Yes, I use the thrift Sender

I mean v0.19.0 of the client - it sends jaeger.thrift instead of zipkin.thrift, where the tracer tags are sent once, but applied to each span on the backend.

mabn · 2017-06-01T02:23:25Z

Yes, this one.

yurishkuro · 2017-06-02T23:23:34Z

@mabn is it actually necessary to adjust the offset? The main requirements for the trace are:

startTime + duration == endTime
For spans started in the same process at true time t1 and t2 we have startTime1 - startTime2 == t1 - t2

It does not matter if both startTime and endTime are "off" by some delta, clocks across processes are skewed anyway, and the ClockSkewAdjuster is expected to fix those. As long as nanoTime() is monotonic, it doesn't even seem to matter if it's consistently faster or slower than the wall clock time, since nanoTime() is defined as the correct way for measuring elapsed time.

So would the following work?

// on tracer initialization, capture
initWallclock = currentTimeMillis()
initElapsed = nanoTime()

// when starting the span
startTimeNanos = nanoTime()
startTimeMicros = initWallclock * 1000 + (startTimeNanos - initElapsed) / 1000

// when finishing the span
durationMicros = (nanoTime() - startTimeNanos) / 1000

cc @sul4bh @pavolloffay

mabn · 2017-06-03T12:39:49Z

That approach comes to mind, but I'm not sure if it works. My concern is that clock drifts and is adjusted from time to time - e.g. by nptd. The rate of drift changes - e.g it depends on the temperature but in the cloud environment there are additional factors caused by virtualization. There are also edge cases like leap seconds.

The approach you mention basically records the offset between nanoTime() and currentTimeMillis() at Tracer initialization and uses it later, so it would be fine if the offset didn't change.

On linux nanoTime() uses clock_gettime with CLOCK_MONOTONIC. Additionally CLOCK_MONOTONIC is adjusted by NTP so it seems that the offset between nanoTime() and currentTimeMillis() should be more or less constant on linux with NTP configured to use adjtime (not sure about ntp_adjtime discipline though)

It would be good to test it, but for example POSIX allows CLOCK_MONOTONIC frequency to vary by up to 500ppm (8ms/day). If the offset is not constant it might lead to significant inaccuracies in start_time for long running processes - which is the normal case for services.

So it gets tricky and OS-specific.

I'd rather see one of following solutions:

Calculate the offset inside Tracer.extract() and store it inside Span or SpanContext. Pass it to child spans. Use nanoTime + offset to calculate start_time. This will ensure that inside a single trace within a single process the start times are accurate with nanosecond precision. The drawback is that the offset has to be passed around and is no longer encapsulated inside Clock.
Create Clock implementation which updates offset periodically - e.g. once every 30 seconds. If process spends usually significantly less than 30 seconds on processing a single trace then it will be rare for the trace to be affected by offset change. But it will happen and the jumps will be between -1ms and +1ms (assuming that currentTimeMillis has 1ms accuracy, which is not the case on some systems - e.g. on windows xp). The jumps aren't a big issue because JVM pauses happen anyway. If it ensured that time cannot go backwards it would be a pretty good solution. Big plus - it fits the current Clock interface.

I also remember some discussion about implementing better time source in some opentracing-related project, but I wasn't able to find it.

yurishkuro · 2017-06-03T21:10:05Z

Additionally CLOCK_MONOTONIC is adjusted by NTP

yes, but it still remains monotonic, the adjustments are probably done as smearing of the delta over some larger time interval, making the change unnoticeable to the application.

It would be good to test it, but for example POSIX allows CLOCK_MONOTONIC frequency to vary by up to 500ppm (8ms/day). If the offset is not constant it might lead to significant inaccuracies in start_time for long running processes - which is the normal case for services.

Now sure how big of a problem this is. In some cases the system's HW timer can be so out of whack that measuring precise elapsed time for tracing is probably not the biggest problem. And for long running processes, 8ms error per day is nothing, and again we'd expect the system to have a proper NTP configuration to compensate for the frequency drift: "This clock's frequency might be adjusted in a PLL control loop once an external reference (NTP, GPS, etc.) has been available long enough to measure the ±500 ppm frequency error and instability of typical motherboard oscillators."

So my preference would be to start with this simplified approach. Of the two other approaches you listed, I think the first one is doable, but certainly more complicated, while the second may still result in unaccounted jumps and bad timing for short spans.

yurishkuro · 2017-06-03T21:10:42Z

btw, thanks for the references.

pavolloffay · 2017-06-12T08:43:52Z

btw. java 9 should provide nanoseconds accuracy:

yurishkuro · 2017-06-13T19:19:19Z

Lightstep tracer is doing something funky with time: https://github.com/lightstep/lightstep-tracer-java/blob/master/common/src/main/java/com/lightstep/tracer/shared/ClockState.java#L86

mabn · 2017-07-15T12:38:33Z

I did this:
https://gist.github.com/mabn/9587658d78d730917559cd61a274ea4f
Seems to work so far. I haven't seen it logging warnings yet.

yurishkuro · 2017-07-15T18:51:17Z

@mabn we're thinking of going with your option 1 where the millis wallclock timestamp is captured in the SpanContext and all other timestamps are calculated as offsets using nanos() (e.g. jaegertracing/jaeger-client-node#122 (comment))

Your gist is a simpler version, which might be good enough. I wouldn't expect it to log the differences unless your machine's HW timer is seriously faster or slower than a true clock AND you have ntpd adjustments happening.

mabn · 2017-08-01T23:43:14Z

I'm pretty sure that a leap second will make my clock 1-second off until application restart. Same with any manual (non-ntpd) adjustments.

jpkrohling · 2018-04-20T09:29:19Z

Doesn't the code in the Gist require Java 8? Or is it about a different clock?

https://docs.oracle.com/javase/8/docs/api/java/time/Clock.html

I'd be curious to see if this behavior would also be present for recent JVMs (8+), as date/time manipulation is really better and more precise with the new APIs.

olivercf · 2018-05-03T18:15:29Z

Is this being worked on? I seem to be experiencing this rather severely:

(top blue span is the parent, all other blue spans are immediate children that happen consecutively in a loop)

mabn · 2018-05-07T09:27:03Z

@jpkrohling It's not JDK8 Clock, it's io.jaegertracing.utils.Clock one.

Yes, this behaviour it's present in JDK8 and it will also be present in newer ones because io.jaegertracing.utils.SystemClock uses System.currentTimeMillis.

On JDK9+ it might be a good idea to simply use system clock - it has now sufficient precision due to https://bugs.openjdk.java.net/browse/JDK-8068730 - Increase the precision of the implementation of java.time.Clock.systemUTC().

olivercf · 2018-05-08T10:04:00Z

Is there a workaround involving setting certain tags? I'm manually setting the process "ip" tag so everything has the same IP (it is all running in docker on the same machine), but still seem to be experiencing the problem.

yurishkuro · 2018-05-08T12:21:33Z

@olivercf this appears to be a separate, off-topic question. How is it related to the clocks? If it's not, please move to another issue.

olivercf · 2018-05-08T12:49:39Z

@yurishkuro The start times appear to snapping/rounding to incorrect times as described in the initial message in this issue:

So - child span end time exceeds parent's finish time by between 0 and 1000 micros which in combination with missing process "ip" tag triggers clock skew adjuster. And for very long parents and very short children the adjustment is drastic (around 50% of parent duration).

As you can see from my first post, some of the children are 0ms, and the parent is relatively long. All of the children should be contained within the parent in my example, as the parent is a for loop and the children are each iteration, however it seems all but the first child has been snapped to a particular start time, resulting in a completely incorrect graph. You can see some children's finish times are after the parent's. As they happen consecutively, you'd expect each child's start time to be the finish time of the last.

The first post mentions that if the "ip" tag is the same, the clock skew adjuster shouldn't kick in, but it looks like it does in my case. It seems that is relevant information for this issue, and I'm also wondering if there is a workaround for the meantime until it is fixed.

mabn · 2018-05-08T13:40:41Z

@olivercf what you are experiencing is because your spans are very short (sub 1ms) but Java system clock has 1ms precision so the start time is truncated and they all seem to start at the same time. The workaround I use is this clock implementation:
https://gist.github.com/mabn/9587658d78d730917559cd61a274ea4f
use it when creating a Tracer instance and it should solve your issue.

olivercf · 2018-05-08T15:15:15Z

@mabn Thanks, that worked.

sul4bh mentioned this issue Jun 2, 2017

Adds a nanosecond timer jaegertracing/jaeger-client-node#122

Closed

djeeg mentioned this issue Aug 10, 2017

[trace view] childOf time ordering incorrect gets center aligned jaegertracing/jaeger-ui#50

Closed

felippe-mendonca mentioned this issue Aug 4, 2018

Make clock skew adjustment transparent jaegertracing/jaeger#961

Closed

3 tasks

wuyupengwoaini mentioned this issue Feb 21, 2019

Client span is finished before server span opentracing-contrib/java-spring-web#78

Open

IAD mentioned this issue Feb 17, 2020

JUS-1278 jaeger tracing AccelByte/iam-go-sdk#15

Merged

sheinbergon mentioned this issue Jun 21, 2020

Real microseconds timestamp accuracy for JDK 9 and above #712

Merged

yurishkuro closed this as completed Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Side effects of imprecise clock #192

Side effects of imprecise clock #192

mabn commented May 31, 2017 •

edited

Loading

yurishkuro commented Jun 1, 2017

mabn commented Jun 1, 2017

yurishkuro commented Jun 1, 2017

mabn commented Jun 1, 2017

yurishkuro commented Jun 2, 2017

mabn commented Jun 3, 2017

yurishkuro commented Jun 3, 2017

yurishkuro commented Jun 3, 2017

pavolloffay commented Jun 12, 2017

yurishkuro commented Jun 13, 2017

mabn commented Jul 15, 2017

yurishkuro commented Jul 15, 2017

mabn commented Aug 1, 2017

jpkrohling commented Apr 20, 2018

olivercf commented May 3, 2018

mabn commented May 7, 2018 •

edited

Loading

olivercf commented May 8, 2018

yurishkuro commented May 8, 2018

olivercf commented May 8, 2018

mabn commented May 8, 2018

olivercf commented May 8, 2018

Side effects of imprecise clock #192

Side effects of imprecise clock #192

Comments

mabn commented May 31, 2017 • edited Loading

yurishkuro commented Jun 1, 2017

mabn commented Jun 1, 2017

yurishkuro commented Jun 1, 2017

mabn commented Jun 1, 2017

yurishkuro commented Jun 2, 2017

mabn commented Jun 3, 2017

yurishkuro commented Jun 3, 2017

yurishkuro commented Jun 3, 2017

pavolloffay commented Jun 12, 2017

yurishkuro commented Jun 13, 2017

mabn commented Jul 15, 2017

yurishkuro commented Jul 15, 2017

mabn commented Aug 1, 2017

jpkrohling commented Apr 20, 2018

olivercf commented May 3, 2018

mabn commented May 7, 2018 • edited Loading

olivercf commented May 8, 2018

yurishkuro commented May 8, 2018

olivercf commented May 8, 2018

mabn commented May 8, 2018

olivercf commented May 8, 2018

mabn commented May 31, 2017 •

edited

Loading

mabn commented May 7, 2018 •

edited

Loading