forked from brendangregg/perf-tools
-
Notifications
You must be signed in to change notification settings - Fork 1
/
funcslower_example.txt
109 lines (84 loc) · 4.2 KB
/
funcslower_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
Demonstrations of funcslower, the Linux ftrace version.
Show me ext3_readpages() calls slower than 1000 microseconds (1 ms):
# ./funcslower ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
0) ! 8147.120 us | } /* ext3_readpages */
0) ! 8135.067 us | } /* ext3_readpages */
0) ! 12202.93 us | } /* ext3_readpages */
0) ! 12201.84 us | } /* ext3_readpages */
0) ! 8142.667 us | } /* ext3_readpages */
0) ! 12194.14 us | } /* ext3_readpages */
^C
Ending tracing...
Neat. So this confirms that there are ext3_readpages() calls that are taking
over 8000 us (8 ms).
funcslower uses the ftrace function graph profiler to dynamically instrument
the given kernel function, time it in-kernel, and only emit events slower
than the given latency threshold in-kernel. Since this all operates in
kernel context, the overheads are relatively low (compared to post-processing
in user space).
Now include the process name and PID (-P) of the process who is on-CPU, and the
absolute timestamp (-t) of the event:
# ./funcslower -Pt ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
2678112.003180 | 0) cksum-26695 | ! 8145.268 us | } /* ext3_readpages */
2678113.538763 | 0) cksum-26695 | ! 8139.086 us | } /* ext3_readpages */
2678113.704901 | 0) cksum-26695 | ! 8147.549 us | } /* ext3_readpages */
2678113.721102 | 0) cksum-26695 | ! 8142.530 us | } /* ext3_readpages */
2678113.810269 | 0) cksum-26695 | ! 12234.70 us | } /* ext3_readpages */
2678113.996625 | 0) cksum-26695 | ! 8146.129 us | } /* ext3_readpages */
2678114.012832 | 0) cksum-26695 | ! 8148.153 us | } /* ext3_readpages */
^C
Ending tracing...
Great! Now I can see the process name, which in this case is the responsible
process. The timestamps also let me determine the rate of these slow events.
Now measure time differently: excluding time spent sleeping, so that we only
see on-CPU time:
# ./funcslower -Pct ext3_readpages 1000
Tracing "ext3_readpages" slower than 1000 us... Ctrl-C to end.
^C
Ending tracing...
I believe the workload hasn't changed, so these ext3_readpages() calls are
still happening, however, their CPU time doesn't exceed 1 ms. Compared to the
earlier output, this tells me that the latency in this function is due to time
spent blocked off-CPU, and not on-CPU. This makes sense: this function is
ultimately being blocked on disk I/O.
Were the function duration times to be similar with and without -C, that would
tell us that the high latency is due to time spent on-CPU executing code.
This traces the sys_nanosleep() kernel function, and shows calls taking over
100 us:
# ./funcslower sys_nanosleep 100
Tracing "sys_nanosleep" slower than 100 us... Ctrl-C to end.
0) ! 2000147 us | } /* sys_nanosleep */
------------------------------------------
0) registe-27414 => vmstat-27419
------------------------------------------
0) ! 1000143 us | } /* sys_nanosleep */
0) ! 1000154 us | } /* sys_nanosleep */
------------------------------------------
0) vmstat-27419 => registe-27414
------------------------------------------
0) ! 2000183 us | } /* sys_nanosleep */
------------------------------------------
0) registe-27414 => vmstat-27419
------------------------------------------
0) ! 1000141 us | } /* sys_nanosleep */
^C
Ending tracing...
This is an example where I did not use -P, but ftrace has included process
information anyway. Look for the lines containing "=>", which indicate a process
switch on the given CPU.
Use -h to print the USAGE message:
# ./funcslower -h
USAGE: funcslower [-aChHPt] [-p PID] [-d secs] funcstring latency_us
-a # all info (same as -HPt)
-C # measure on-CPU time only
-d seconds # trace duration, and use buffers
-h # this usage message
-H # include column headers
-p PID # trace when this pid is on-CPU
-P # show process names & PIDs
-t # show timestamps
eg,
funcslower vfs_read 10000 # trace vfs_read() slower than 10 ms
See the man page and example file for more info.