Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace.py -t has negative time #931

Closed
brendangregg opened this issue Feb 2, 2017 · 5 comments · Fixed by #969
Closed

trace.py -t has negative time #931

brendangregg opened this issue Feb 2, 2017 · 5 comments · Fixed by #969

Comments

@brendangregg
Copy link
Member

# ./trace.py -t do_sys_open
TIME     PID    TID    COMM         FUNC             
-20.5042 1762   1762   snmpd        do_sys_open      
-20.5042 1762   1762   snmpd        do_sys_open      
-20.5042 1762   1762   snmpd        do_sys_open      
-20.5042 1762   1762   snmpd        do_sys_open      
-20.5041 1762   1762   snmpd        do_sys_open      
-20.5041 1762   1762   snmpd        do_sys_open      
-20.5041 1762   1762   snmpd        do_sys_open      
-20.5041 1762   1762   snmpd        do_sys_open    

Something up with the initial timestamp? Seems to always start at negative 20.5 seconds on this machine (Linux 4.9-rc5).

@goldshtn
Copy link
Collaborator

goldshtn commented Feb 3, 2017

Hmm. I'm seeing an even bigger offset on my box, +1118 seconds! So something is definitely wrong.

Looking at the code, I'm using clock_gettime from librt to get the monotonic time, and I was under the impression this is the same time source used by bpf_ktime_get_ns. If it isn't, this is pretty bad and needs to be fixed.

Perhaps @4ast would have an idea?

@4ast
Copy link
Member

4ast commented Feb 8, 2017

this doesn't look good indeed and i don't see anything in kernel/time/timekeeping.c that could have caused this behavior. Since you see it on 4.9-rc5, it's certainly not the commit 58bfea953255 ("timekeeping: Fix __ktime_get_fast_ns() regression"). Can you please reproduce it on 4.10 and report it to lkml.
bpf_ktime_get_ns is using ktime_get_mono_fast_ns which should be the same as clock_getime(mono)
Please cc mathieu.desnoyers@efficios.com, since he's probably hitting this issue as well.

@brendangregg
Copy link
Member Author

Switching trace.py to use CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_RAW fixes it.

AFAIK, CLOCK_MONOTONIC is NTP adjusted. Is bpf_ktime_get_ns() NTP adjusted too?

I also wrote this, clocks.c, to confirm the delta between the clocks:

#include <time.h>
#include <stdio.h>

int main()
{
    struct timespec t1, t2;
    double d;

    clock_gettime(CLOCK_MONOTONIC, &t1);
    printf("CLOCK_MONOTONIC     %ld s %ld ns\n", t1.tv_sec, t1.tv_nsec);

    clock_gettime(CLOCK_MONOTONIC_RAW, &t2);
    printf("CLOCK_MONOTONIC_RAW %ld s %ld ns\n", t2.tv_sec, t2.tv_nsec);

    d = t2.tv_sec + (double)t2.tv_nsec / 1000000000;
    d -= t1.tv_sec + (double)t1.tv_nsec / 1000000000;
    printf("delta (RAW - non-RAW): %.6f", d);

    return 0;
}

system 1:

# ./clocks 
CLOCK_MONOTONIC     4745632 s 546869809 ns
CLOCK_MONOTONIC_RAW 4745660 s 257679602 ns
delta (RAW - non-RAW): 27.710810

# ./trace.py -t do_sys_open
TIME     PID    TID    COMM         FUNC             
-26.6151 1968   1968   redis-server do_sys_open      
-26.6145 1651   1651   redis-server do_sys_open      
-26.5149 1968   1968   redis-server do_sys_open      
-26.5143 1651   1651   redis-server do_sys_open      
-26.4147 1968   1968   redis-server do_sys_open      
-26.4141 1651   1651   redis-server do_sys_open      
[...]

system 2:

# ./clocks 
CLOCK_MONOTONIC     2411280 s 250871095 ns
CLOCK_MONOTONIC_RAW 2411166 s 518161161 ns
delta (RAW - non-RAW): -113.732710

# ./trace.py -t do_sys_open | head 
TIME     PID    TID    COMM         FUNC             
113.8781 9945   9945   redis-server do_sys_open      
113.8790 1494   1494   redis-server do_sys_open      
113.8787 9945   9945   redis-server do_sys_open      
113.8795 1494   1494   redis-server do_sys_open      
114.8793 9945   9945   redis-server do_sys_open      
114.8801 1494   1494   redis-server do_sys_open     
[...]

@goldshtn
Copy link
Collaborator

@brendangregg Are you planning to fix this in a PR, or should I do it?

@brendangregg
Copy link
Member Author

I was digging into why, but yes, if you can send in the PR go ahead since it should be fixed (CLOCK_MONOTONIC_RAW -> CLOCK_MONOTONIC).

So far I had:

  • bpf_ktime_get_ns -> ktime_get_mono_fast_ns() -> __ktime_get_fast_ns(tk_fast_mono) tkf->base tkr->base

  • CLOCK_MONOTONIC -> clock_monotonic() -> posix_ktime_get_ts() -> ktime_get_ts() -> ktime_get_ts64() -> tk_core.timekeeper tk->xtime_sec tx->tkr_mono

  • CLOCK_MONOTONIC_RAW -> posix_get_monotonic_raw() -> getrawmonotonica() -> getrawmonotonic64() -> tk_core.timekeeper, tk->raw_time + tk->tkr_raw

goldshtn added a commit to goldshtn/bcc that referenced this issue Feb 15, 2017
This allows all the tools (currently: trace and memleak) to use the
`monotonic_time` facility, with the fix to use `CLOCK_MONOTONIC`
instead of `CLOCK_MONOTONIC_RAW`. Resolves iovisor#931.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants