-
Notifications
You must be signed in to change notification settings - Fork 3.9k
/
funcinterval_example.txt
263 lines (232 loc) · 15.3 KB
/
funcinterval_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
Demonstrations of funcinterval, the Linux eBPF/bcc version.
eBPF/bcc is very suitable for platform performance tuning.
By funclatency, we can profile specific functions to know how latency
this function costs. However, sometimes performance drop is not about the
latency of function but the interval between function calls.
funcinterval is born for this purpose.
Another story, hardware performance tuning on the platform we will use
protocol analyzer to analyze performance, but most protocol analyzers lack
the distribution feature. Using a protocol analyzer you need a lot of time
to check every detail latency. By funcinterval, we can save a lot of time
by distribution feature.
For example:
# ./funcinterval xhci_ring_ep_doorbell -d 2 -u
Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 134 | |
64 -> 127 : 2862 |******************** |
128 -> 255 : 5552 |****************************************|
256 -> 511 : 216 |* |
512 -> 1023 : 2 | |
Detaching...
This example output shows that the interval latency of xhci_ring_ep_doorbell
took between 64 and 255 microseconds. USB MAC will start its job after USB
doorbell register ringing, above information that can help hardware engineer to
analyze, the performance drop is because software rings the doorbell too
late or just slowly hardware DMA.
# ./funcinterval blk_start_request -i 2 -u
Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 5 |* |
16 -> 31 : 0 | |
32 -> 63 : 1 | |
64 -> 127 : 2 | |
128 -> 255 : 1 | |
256 -> 511 : 0 | |
512 -> 1023 : 1 | |
1024 -> 2047 : 1 | |
2048 -> 4095 : 117 |****************************************|
4096 -> 8191 : 13 |**** |
8192 -> 16383 : 1 | |
If using biolatency tool that has no difference between two platforms.
Maybe the problem is related to the interval time instead of block
device I/O latency.
# ./funcinterval ion_ioctl -i 2 -m
Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end.
msecs : count distribution
0 -> 1 : 215 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 4 | |
8 -> 15 : 5 | |
16 -> 31 : 29 |***** |
You can also check the ion_ioctl. By the above output, we know the activity
frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low
frequency.
# ./funcinterval t:block:block_bio_queue -d 30 -u
Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 2 | |
16 -> 31 : 262 | |
32 -> 63 : 9075 |******************* |
64 -> 127 : 18668 |****************************************|
128 -> 255 : 1492 |*** |
256 -> 511 : 2616 |***** |
512 -> 1023 : 7226 |*************** |
1024 -> 2047 : 8982 |******************* |
2048 -> 4095 : 2394 |***** |
4096 -> 8191 : 163 | |
8192 -> 16383 : 42 | |
16384 -> 32767 : 2 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 1 | |
Detaching...
# ./funcinterval t:block:block_rq_issue -d 30 -u
Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 5 | |
32 -> 63 : 18 | |
64 -> 127 : 32 | |
128 -> 255 : 95 | |
256 -> 511 : 2194 |****** |
512 -> 1023 : 13830 |****************************************|
1024 -> 2047 : 9001 |************************** |
2048 -> 4095 : 1569 |**** |
4096 -> 8191 : 96 | |
8192 -> 16383 : 17 | |
Detaching...
funcinterval also supports tracepoint filter. The above two cases are under EMMC
throughput testing, by those results you know which layer has a slower interval
time. In our case, mmc-cmdqd is slower than block layer.
# ./funcinterval -p `pidof -s top` c:malloc -i 3
Tracing uprobe function for "malloc"... Hit Ctrl-C to end.
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 7 |************************* |
8192 -> 16383 : 11 |****************************************|
16384 -> 32767 : 4 |************** |
32768 -> 65535 : 1 |*** |
65536 -> 131071 : 1 |*** |
131072 -> 262143 : 1 |*** |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 1 |*** |
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 8 |******************************** |
8192 -> 16383 : 10 |****************************************|
16384 -> 32767 : 4 |**************** |
32768 -> 65535 : 1 |**** |
65536 -> 131071 : 1 |**** |
131072 -> 262143 : 1 |**** |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 1 |**** |
Time the interval of libc's malloc for top utility every 3 seconds.
# ./funcinterval /usr/local/bin/python:main
Tracing uprobe function for "main"... Hit Ctrl-C to end.
^C
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 0 | |
8192 -> 16383 : 0 | |
16384 -> 32767 : 0 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 0 | |
8388608 -> 16777215 : 0 | |
16777216 -> 33554431 : 0 | |
33554432 -> 67108863 : 0 | |
67108864 -> 134217727 : 0 | |
134217728 -> 268435455 : 0 | |
268435456 -> 536870911 : 1 |****************************************|
536870912 -> 1073741823 : 1 |****************************************|
1073741824 -> 2147483647 : 1 |****************************************|
2147483648 -> 4294967295 : 1 |****************************************|
Detaching...
Time the interal of python's main function.
USAGE message:
# ./funcinterval -h
usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m]
[-v]
pattern
Time interval and print latency as a histogram
positional arguments:
pattern Function name for tracing
optional arguments:
-h, --help show this help message and exit
-p PID, --pid PID trace this PID only
-i INTERVAL, --interval INTERVAL
summary interval, in seconds
-d DURATION, --duration DURATION
total duration of trace, in seconds
-T, --timestamp include timestamp on output
-u, --microseconds microsecond histogram
-m, --milliseconds millisecond histogram
-v, --verbose print the BPF program (for debugging purposes)
examples:
# time the interval of do_sys_open()
./funcinterval do_sys_open
# time the interval of xhci_ring_ep_doorbell(), in microseconds
./funcinterval -u xhci_ring_ep_doorbell
# time the interval of do_nanosleep(), in milliseconds
./funcinterval -m do_nanosleep
# output every 5 seconds, with timestamps
./funcinterval -mTi 5 vfs_read
# time process 181 only
./funcinterval -p 181 vfs_read
# time the interval of mm_vmscan_direct_reclaim_begin tracepoint
./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin
# time the interval of c:malloc used by top every 3 seconds
./funcinterval -p `pidof -s top` -i 3 c:malloc
# time /usr/local/bin/python main function
./funcinterval /usr/local/bin/python:main