forked from brendangregg/perf-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
execsnoop_example.txt
153 lines (129 loc) · 6.7 KB
/
execsnoop_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Demonstrations of execsnoop, the Linux ftrace version.
Here's execsnoop showing what's really executed by "man ls":
# ./execsnoop
Tracing exec()s. Ctrl-C to end.
PID PPID ARGS
22898 22004 man ls
22905 22898 preconv -e UTF-8
22908 22898 pager -s
22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
22906 22898 tbl
22911 22910 locale charmap
22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
22914 22912 grotty
Many commands. This is particularly useful for understanding application
startup.
Another use for execsnoop is identifying short-lived processes. Eg, with the -t
option to see timestamps:
# ./execsnoop -t
Tracing exec()s. Ctrl-C to end.
TIMEs PID PPID ARGS
7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
7419756.154131 8186 8184 cat -v trace_pipe
7419756.245264 8188 1698 ./run
7419756.245691 8189 1696 ./run
7419756.246212 8187 1689 ./run
7419756.278993 8190 1693 ./run
7419756.278996 8191 1692 ./run
7419756.288430 8192 1695 ./run
7419756.290115 8193 1691 ./run
7419756.292406 8194 1699 ./run
7419756.293986 8195 1690 ./run
7419756.294149 8196 1686 ./run
7419756.296527 8197 1687 ./run
7419756.296973 8198 1697 ./run
7419756.298356 8200 1685 ./run
7419756.298683 8199 1688 ./run
7419757.269883 8201 1696 ./run
[...]
So we're running many "run" commands every second. The PPID is included, so I
can debug this further (they are "supervise" processes).
Short-lived processes can consume CPU and not be visible from top(1), and can
be the source of hidden performance issues.
Here's another example: I noticed CPU usage was high in top(1), but couldn't
see the responsible process:
$ top
top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
Swap: 0k total, 0k used, 0k free, 5432356k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
[...]
See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
isn't visible from the process listing.
vmstat agreed, showing the same average CPU usage statistics:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
[...]
So this could be caused by short-lived processes, who vanish before they are
seen by top(1). Do I have my execsnoop handy? Yes:
# ~/perf-tools/bin/execsnoop
Tracing exec()s. Ctrl-C to end.
PID PPID ARGS
10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
10240 10238 cat -v trace_pipe
10242 7225 sh [?]
10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
10245 7225 sh [?]
10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
10248 7225 sh [?]
10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
10251 7225 sh [?]
10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
10254 7225 sh [?]
10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
10257 7225 sh [?]
10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
10260 7225 sh [?]
10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
10263 7225 sh [?]
10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
10266 7225 sh [?]
10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
[...]
The output scrolled quickly, showing that many shell and lsof processes were
being launched. If you check the PID and PPID columns carefully, you can see that
these are ultimately all from PID 7225. We saw that earlier in the top output:
ec2rotatelogs, at 3% CPU. I now know the culprit.
I should have used "-t" to show the timestamps with this example.
Run -h to print the USAGE message:
# ./execsnoop -h
USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
-d seconds # trace duration, and use buffers
-a argc # max args to show (default 8)
-r # include re-execs
-t # include time (seconds)
-h # this usage message
name # process name to match (REs allowed)
eg,
execsnoop # watch exec()s live (unbuffered)
execsnoop -d 1 # trace 1 sec (buffered)
execsnoop grep # trace process names containing grep
execsnoop 'log$' # filenames ending in "log"
See the man page and example file for more info.