forked from blair/orca
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
451 lines (367 loc) · 19.3 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
From Neil Tan <ntan@e2open.com>
I am not very conversant with Perl scripting and would like a
feature added to the Orca main index page. I have Orca charts
running for 600-800 machines Solaris and Linux. One constant
problem is machines falling out due to Jumpstarts, ssh upgrades,
etc. A common problem with a large computer environment with
several system administrators. A feature I think would be a good
addition would be to change the color of the name of the host in
the main index page to red if that Orca client's data was overdue.
have Orca's pages look much nicer. Check out some of our competition
from Cacti:
http://www.raxnet.net/products/cacti/
http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true
From Hans-Werner Jouy:
Is it possilbe to use different config files with a "include
xxx.conf" in the main configuration? This makes sense when you
use different collectors (i.e. for different architectures), a
change in the collecting scripts then corresponds with the same
include files. In this way different maintainers can modify
independly from each other.
From Hans-Werner Jouy:
Orca is a bit pricky about changing columns in a single
machine. but sometimes I dont get all the columns, like nfs when
no (auto-) mount is active. I have to fill all the missing info
with zero inside the collecting script,, but it schould read
"momentarily no data".
Fix lib/Orca/OldState.pm to handle missing data files that are pointed
to.
Remove the percol listing in find_files in orca_services.cfg.
Update orca_services to 1.7.2 or 2.0.
Code review procallator.
* Bug: If there is no match made in a find_files, then handle this case.
See the message in my Sent box at 11/8/2000 4:00 PM with a subject
Unable to view orca graphs.
This is a pretty comprehensive to-do list for Orca and the related data
gathering tools. Any comments and additions to this list are welcome.
To motivate the discussion of this to do list, let me give some
background on our setup. GeoCities site has over 100 hosts. I've
been running orcallator.se on some hosts since September 1998 that
have stored over 300 source text files. Currently I have 34472 files
using 7.3 gigabytes of storage. I have 9 different orcallator.cfg for
different classes of machines.
* Orca: Fix the "did exist and is now gone" email messages for source
files that are fine.
* Orca: Have an install option just for orcallator.se and not the
Orca whole pacckage.
* Orca: Add flag to put Orca in background or daemonize it.
* Orca: bug fix:
Fix the bug where if a legend is listed in a plot {} then the
number of colored boxes in the generated plot are more than
the data that appears there.
* Orca: Load arbitrarily formatted source text files.
(If this is implemented, then the problems of source text files
not containing fields that orcallator.cfg is looking for and
Orca complaining will disappear).
Orca can only handle source text files that have data from one
source, such as a host, in a single file. It cannot currently
handle source data like this:
# Web_server time hits/s bytes/s
www1 933189273 1304 10324
www2 933189273 2545 40322
I plan on having Orca allow something like this in the configuration
file for each group. To read the data above, something like this
would be defined. Here the ChangeSubGroup and StoreMeasurement
would be macros or subroutines defined elsewhere for Orca users
to use.
group web_servers {
upon_open
# Do nothing special.
upon_new_data
while (<$fh>) {
next if /^#/;
my @line = split;
next unless (@line == 4);
my ($www, $time, $hits_s, $bytes_s) = @line;
ChangeSubGroup($www);
StoreMeasurement('Hits Per Second', $time, $hits_s);
StoreMeasurement('Bytes Per Second', $time, $bytes_s);
}
upon_close
# Do nothing special.
}
For the standard orcallator.se output, something like this would
be used:
group orcallator {
find_files /usr/local/var/orca/orcallator/(.*)/orcallator-\d+-\d+\d+
upon_open
# Look for the first # line describing the data.
while (<$fh>)) {
last if /^#/;
}
@ColumnsNames = split;
# Do some searching for the column with the Unix time in it.
my $time_column = ....;
splice(@ColumnNames, $time_column, 1);
upon_new_data
# Load the new data.
while (my $line = <$fh>) {
my @line = split;
next unless @line == @ColumnNames + 1;
my ($time) = splice(@line, $time_column, 1);
#
StoreMeasurements($time, \@ColumnNames, \@line);
}
}
The code for each upon_* would be Perl code designed explicitly
for the type of source text. This would allow the arbitrary
reading of text files, leaving the specifics to the user of
Orca.
This work would also include caching away the type of measurements
that each source data file provides. Currently Orca reads the
header of each source text file for the names of the columns.
With the number of source text files I have, this takes a long
time.
* OS independent data gathering tools:
Many people have been asking for Orca for operating systems other
than Solaris, since orcallator.se only runs on Solaris hosts.
I've given this a little thought and one good solution to this is
use other publically available tools that gather host information.
The one that came to mind is top (ftp://ftp.groupsys.com/pub/top).
Looking at the configure script for top, it runs on the following
OSes:
386bsd For a 386BSD system
aix32 POWER and POWER2 running AIX 3.2.5.0
aix41 PowerPC running AIX 4.1.2.0
aux3 a Mac running A/UX version 3.x
bsd386 For a BSD/386 system
bsd43 any generic 4.3BSD system
bsd44 For a 4.4BSD system
bsd44a For a pre-release 4.4BSD system
bsdos2 For a BSD/OS 2.X system (based on the 4.4BSD Lite system)
convex any C2XX running Convex OS 11.X.
dcosx For Pyramid DC/OSX
decosf1 DEC Alpha AXP running OSF/1 or Digital Unix 4.0.
dgux for DG AViiON with DG/UX 5.4+
dynix any Sequent Running Dynix 3.0.x
dynix32 any Sequent Running Dynix 3.2.x
freebsd20 For a FreeBSD-2.0 (4.4BSD) system
ftx For FTX based System V Release 4
hpux10 any hp9000 running hpux version 10.x
hpux7 any hp9000 running hpux version 7 or earlier
hpux8 any hp9000 running hpux version 8 (may work with 9)
hpux9 any hp9000 running hpux version 9
irix5 any uniprocessor, 32 bit SGI machine running IRIX 5.3
irix62 any uniprocessor, SGI machine running IRIX 6.2
linux Linux 1.2.x, 1.3.x, using the /proc filesystem
mtxinu any VAX Running Mt. Xinu MORE/bsd
ncr3000 For NCR 3000 series systems Release 2.00.02 and above -
netbsd08 For a NetBSD system
netbsd10 For a NetBSD-1.0 (4.4BSD) system
netbsd132 For a NetBSD-1.3.2 (4.4BSD) system
next32 any m68k or intel NEXTSTEP v3.x system
next40 any hppa or sparc NEXTSTEP v3.3 system
osmp41a any Solbourne running OS/MP 4.1A
sco SCO UNIX
sco5 SCO UNIX OpenServer5
sunos4 any Sun running SunOS version 4.x
sunos4mp any multi-processor Sun running SunOS versions 4.1.2 or later
sunos5 Any Sun running SunOS 5.x (Solaris 2.x)
svr4 Intel based System V Release 4
svr42 For Intel based System V Release 4.2 (DESTINY)
ultrix4 any DEC running ULTRIX V4.2 or later
umax Encore Multimax running any release of UMAX 4.3
utek Tektronix 43xx running UTek 4.1
If somebody were to write a tie into top's source code that would
generate output every X minutes to a text file or even RRD files,
then Orca could put it together into a nice package.
In line with this di
(ftp://ftp.pz.pirmasens.de/pub/unix/utilities/), a freely
available disk usage program that could be used to generate disk
usage plots loaded by Orca.
* orcallator.se: Dump directly to RRD files.
A separate RRD file would be created for each measurement.
I do not want all the data stored in a single RRD, since people
commonly add or remove hardware from the system, which will cause
more or less data to be stored. Also, this currently would not
work with RRDtool, since you cannot dynamically add more RRAs
to a RRD file. Saving each measurement in a separate RRD file
removes this issue.
Pros:
1) Disk space savings. For an old host using over 70
megabytes of storage for text output, the RRD files
consume 3.5 megabytes of storage.
2) Orca processing time. Orca spends a large amount of
kernel and CPU time in finding files and getting
the column headers from these files. By storing
the data in RRD files, this is no longer a problem.
Also, Orca itself would not need to move the data
from text to RRD form, speeding up the process of
generating the final plots.
Cons:
1) Potential slowdown in updating the data files.
It is easier to write a single text line using fprintf
than using rrd_update. What is the impact on a single
orcallator.se process?
2) RRDtool format changes and upgrading the data files.
Text files do not change, but if RRDtool does change,
then the data files will need to be upgraded somehow.
3) Loss of data over time. Due to the consolidation
function of RRD, older data will not be as precise
in case it needs to be examined.
4) You cannot grep or simply parse the text files for
particular data for ad-hoc studies.
5) The RRD creation parameters would be set by
orcallator.se and not by Orca, making modifications
harder.
Question: Do the pros outweigh the cons?
* Orca: Potentially use Cricket's configuration ConfigTree Module.
Given more complex Orca installations where many different Orca
configuration files are used, maintaining them will start to be
complicated. For example, in Yahoo!/GeoCities I have 9 different
configurations to split up the hosts for our site I found that
for the number of hosts and the number of data files require this
for reasonable generating of the resulting HTML and PNG files.
It looks like using ConfigTree would allow Orca to use the same
inheritance that Cricket uses. I don't know enough about the
Cricket config tree setup to know if it would work well with Orca.
Work to do: Review the ConfigTree code.
* Orca: Allow different group sources in the same plot.
Currently Orca only allows data sources from one group. Expand
the code to list the group in each data line. Initially, however
only data from one group would be allowed in one data statement.
* Orca: Put the last update time for each host in an HTML file somewhere.
This could be done simply up updating a file that gets included
by the real HTML file. This way the main HTML files do not
have to get rewritten all the time. On large installations,
writing the HTML files is lengthy.
* Orca: Turn off HTML creation via command line option.
Add a command line option to turn off HTML file creation.
* Orca: Update the HTML files is new data is found.
Currently Orca will only update the HTML files if new source
files are found, but not if new data in existing files is found.
Change this.
* orcallator.se: Put HTTP proxy and caching statistics into orcallator.cfg.
Since orcallator.se measures HTTP proxy and caching statistics,
update orcallator.cfg.in to display these data sets.
* orcallator.se: Temperature measurements
Since /usr/platform/sun4u/sbin/prtdiag -v measures the ambient
and CPU temperature, get orcallator.se to measure this data.
* Orca:
Do what it takes to remove the same Ethernet port listings in
orcallator.cfg.in. They seem redundant, but are not totally,
since different interfaces have different maximum data transfer
rates.
* Orca:
Add some error checking code for the maximum number of available
colors so undefined errors do not arise.
* Other:
Mention the AIX tool nmon.
Some notes from Paul Company <paul.company@plpt.com>:
I'd create one graph for each and autoscale. That way you have system
processes, httpd processes and a combination. All your bases are covered.
Presenting data in a useful, meaningful way is an artform & is very
difficult. What makes it an artform is the definitions of useful and
meaningful are subjective. General rules of thumb:
+ Know your audience.
+ Know what you're measuring and why (definitions/goals).
+ Know what the measurements mean.
aka., know what is good/bad/acceptable (limits/thresholds).
+ Know how you're measuring.
aka., how reliable is your measurement?
Graphs are just one way of presenting data.
I definitely don't claim to be an expert of any kind in this area,
but here are my preferences.
Most resources have a max and min limit (this defines your range). I like
to see the max value at the top of the y-axis and the min at the bottom.
Most resources have a usage pattern (this defines your scale). I like
to see autoscaling graphs which dynamically modify the top & bottom to
show the points of activity. This is useful when the range is huge and
the activity is localized to a small band within the huge range.
For example,
We have a Fractional T1 (384Kbps with 50% CIR) at my company and I use
mrtg to monitor it. Because the range is small I use a fixed min & max,
and a fixed linear scale of 1.
(1) x-axis is time
y-axis is Kbps w/ min=0
max=384kb
scale=1Kbps (linear)
Assume a web site gets anywhere from 0 hits/second to 10Million hits
per second. I would want multiple graphs depending on usage pattern:
(1) x-axis is time
y-axis is Kbps w/ min=0
max=10Million
scale=1Hit/s (linear)
(2) x-axis is time
y-axis is Kbps w/ min=0
max=10Million
scale=1-1000Hit/s (logarithmic)
(3) x-axis is time
y-axis is Kbps w/ min={min value of lowest usage, usually zero}
max={max value of highest usage}
scale=auto (multiple graphs with
appropriate range/scale)
Bottom line is it would be nice if one could modify the various attributes
of the graphs (range, scale, labels, title, ...) interactively. I realize
the orca graphs are generated fixed PNG files. Just dreaming.
Here are the things I'd like modified in orca, for my (audience)
specific use.
+ Definitions for all Data Sets which includes what is good and bad.
For example, is a 15-20% collision rate bad?
What is a Disk Busy Measure and is 95 a bad number?
Also, suggestions for what to do if things are bad.
+ Finer resolution (more graphs) on each Data Set.
For example, a graph for each disk, when you click on that graph
you get a graph for each:
Bytes read
Bytes written
KBytes read
KBytes written
Average # of transactions waiting
Average # of transactions serviced
Reads issued
Writes issued
Average service time (milliseconds)
% of time spent waiting for service
% of time spent busy
+ The ability to set thresholds (like virtual_adiran.se, zoom.se
or pure_test.se) and have those thresholds graphed as color
changes (red, amber, green etc.)
+ System & Httpd Processes
Have 3 separate graphs as mentioned above.
+ CPU Usage
Have multiprocessor support - separate graphs for each processor,
in addition to the combined graph.
+ Packets Per Second
I'd like to know the definition of packet and maybe the
ave,min,max size. And maybe the packet types (IP, TCP,
UDP, ICMP).
+ Errors Per Second
Possibly a graph per error type.
+ Nocanput Rate
I'd like to know the definition of nocanput.
What is the unit on the y-axis. What does 3m mean?
+ Collisions
Deferred Packet Rate
TCP Bits Per Second
TCP Segments Per Second
TCP Retransmission & Duplicate Received Percentage
TCP New Connection Rate
TCP Number Open Connections
TCP Reset Rate
TCP Attempt Fail Rate
TCP Listen Drop Rate
Should all be per interface AND cumulative.
+ Page Residence Time
What's a good/bad/acceptable number?
How do you read this graph and relate it to pageouts
or swap thrashing.
Maybe we should plot the sr field of vmstat. p.329
+ Page Usage
Unless you know how memory subsystems work,
this graph is hard to read. The only obvious
thing this graph tells you is if the Free List
is too small. The Free List can be fine, but you
can still have performance problems!
The Other & System Total labels are useless.
Detecting a swap problem (thrashing) would be more useful!
+ Pages Locked & IO
This maps directly to the kernel page usage above.
It is redundant and therefore useless.
+ Bits Per Second: <interface>
Doesn't work for all OS versions and/or NICs.
+ Web Server Hit Rate, Web Server File Size, Web Server Data
Transfer Rate,
Web Server HTTP Error Rate
All don't seem to work!
Probably my fault, I'll take a closer look.