forked from blair/orca
-
Notifications
You must be signed in to change notification settings - Fork 0
/
FAQ
639 lines (498 loc) · 25.3 KB
/
FAQ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
This is a FAQ (Frequently Asked Questions) for Orca and the tools that
gather data for it.
Please email submissions to the FAQ to orca-users@orcaware.com.
# $HeadURL$
# $LastChangedDate$
# $LastChangedBy$
# $LastChangedRevision$
General
-------
1.1) What is the m, k, u, or some other character following the
numbers in the Y axis scale?
1.2) Why is my Y axis scale have such large numbers?
1.3) Why are there random characters at the end of my HTML and GIF
or PNG images names, i.e.
o_host3_disk_runp_c0t6d0...disk_runp_c-4QyP2ziXlrwXj8eG_n_A.html?
1.4) What should I use, NFS or rsync, to get my data from my clients
to the Orca server? Should I push my data to the server from
the clients or have my server pull my data?
1.5) How should I set up ssh access securely without entering a
password everytime a process needs to contact a remote system?
Warning Messages
----------------
2.1) Number of columns in line '1,2,3.....' of
../orcallator/.../percol-2000-09-26 does not match column
description.
2.2) Warning: file '../orcallator/.../temp-percol-2001-02-22' was
current and now is not.
2.3) Warning: cannot create Orca::HTMLFile object: cannot open
'/home/orca_html/o_host1-monthly.html.htm' for writing: Too
many open files.
2.4) Warning: file '.../orcallator/host1/orcallator-2001-11-06-000'
did exist and is now gone.
Solaris/Orcallator.se
---------------------
3.1) What do I do about the error "/opt/RICHPse/bin/se: Unsupported
platform: sparcv9 SunOS 5.8"?
3.2) Why does orcallator.se die, in particular when I log out of the
system that I started it on?
3.3) Why is the page scan rate is always zero even though sar says
it is not?
3.4) Why are all the NFS server statistics zero?
3.5) Why are my Ethernet bits/second measurements all zero?
3.6) Why do I get the error "Fatal: member: txunderruns vanished!:
Near line 201"?
3.7) Why do I get the error "Fatal: member: txunderruns0 vanished!:
Near line 255"?
3.8) Why do I get the error "Fatal: member: framming vanished!: Near
line 160"?
3.9) Why do I get the error "Fatal: member: drop vanished!: Near
line 285"?
3.10) Why do I get the error "Fatal: member: XXXXX vanished!: Near
line YYY"?
3.11) What do I do when I get the message "Fatal: subscript: 2 out of
range for: GLOBAL_net[2]: Near line 178"?
3.12) Why don't I get plots for my Veritas filesystems?
3.13) Why don't I get plots for my RSM filesystems?
3.14) Why don't I get any Interface Bits Per Second data for my qe
board?
3.15) Orcallator.se core dumps.
3.16) Why should I keep my compressed percol-* or orcallator-* files?
3.17) Why do my Orca plots no longer contain any data after I change
anything related to orcallator, such as the subsystems to
measure, or when something changes on the system, such as mount
points, ethernet devices, etc?
General
-------
1.1) What is the m, k, u, or some other character following the
numbers in the Y axis scale?
This is SI magnitude symbol to scale the number by. Here is a
table of symbols and scaling factors.
a 10e-18 Ato
f 10e-15 Femto
p 10e-12 Pico
n 10e-9 Nano
u 10e-6 Micro
m 10e-3 Milli
k 10e3 Kilo
M 10e6 Mega
G 10e9 Giga
T 10e12 Terra
P 10e15 Peta
E 10e18 Exa
So if you see "250 m" in the Y axis, this means 0.25.
The page
http://www.physlink.com/reference_dprefixes.cfm
is also a good reference for this information.
1.2) Why is my Y axis scale have such large numbers?
See question 1) above. Most likely there is a scaling letter
after your number that means the true value is several orders
of magnitude smaller than you think it is.
1.3) Why are there random characters at the end of my HTML and GIF
or PNG images names, i.e.
o_host3_disk_runp_c0t6d0...disk_runp_c-4QyP2ziXlrwXj8eG_n_A.html?
The way Orca generates HTML and image filenames uses all of the
input data sources. With plots that contain a large amount of
different data, the filename can exceed the maximum filename
length for the operating system and Orca will not be able to
create the file.
The solution is to limit the filename length to less than 255
characters. This could be performed by simply trimming the
filename down to less than 255 characters, but then the
filename may not be unique and two distinct HTML files and/or
images may end up being written to the same file. The solution
used by Orca is to calculate the MD5 hash of the full length
filename, trim the filename down and insert the MD5 into the
short filename, which will guarantee uniqueness.
1.4) What should I use, NFS or rsync, to get my data from my clients
to the Orca server? Should I push my data to the server from
the clients or have my server pull my data?
[Answer written by Sean O'Neill <sean@seanoneill.info>.]
Yeah, NFS is a total pain for more reasons than just security.
rsync is the way to go. By default, it uses ssh as it's
transport application vs. rsh. Don't use rsh for the obvious
reasons.
But you need to really think about what this means in regards
to security. First, your security group is probably going to
be nervous about anything that allows for unattended
password-less access between servers. But you also need to
figure out if you want to PUSH or PULL your Orca data.
If you have a Orca server that ssh's into the remote systems
and rsync's down the data (e.g. PULL), this one machine would
have ssh access to LOTS of other systems and would probably
make any security group very nervous about that machine.
If you have the remote systems rsync their data to the Orca
server (e.g. PUSH), then you have lots of other machines with
ssh access to ONE system. This generally makes security a
/little/ less nervous.
Some folks on the list have multiple Orca servers because of
the system resources required by Orca. Its a CPU/memory hog at
times. Also, pushing data into a box is generally an
asynchronous activity (from the Orca server's point of view) so
it will take in as many as the box will support.
If the Orca server is PULLING data, you need some script to
keep track of what systems to pull data from, have logic to
make it less serial to get the data down faster, etc etc -
e.g. its more of a headache IMHO.
1.5) How should I set up ssh access securely without entering a
password everytime a process needs to contact a remote system?
To get ssh working, use key authentication. One easy way to
use key authentication is to use the keychain tool at
http://www.gentoo.org/proj/en/keychain.xml
The first keychain article introduces the concepts behind
RSA/DSA key authentication and shows you how to set up
primitive (with passphrase) RSA/DSA authentication:
http://www-106.ibm.com/developerworks/library/l-keyc.html
The second article shows you how to use keychain to set up
secure, password-less ssh access in an extremely convenient
way. keychain also provides a clean, secure way for cron jobs
to take advantage of RSA/DSA keys without having to use
insecure unencrypted private keys.
http://www-106.ibm.com/developerworks/linux/library/l-keyc2/
A third keychain article shows you how to use ssh-agent's
authentication forwarding mechanism.
http://www-106.ibm.com/developerworks/linux/library/l-keyc3/
Even with these methods, when a system reboots, a person will
need to manually log into the system, su into the account, run
keychain and enter the passphrase to unlock the RSA/DSA keys.
Warning Messages
----------------
2.1) Number of columns in line '1,2,3.....' of
../orcallator/...../percol-2000-09-26 does not match column
description.
When Orca sees a line in an input data file that does not have
the same number of columns as defined at the top of the file
when column_description is 'first_line' or does not match the
column_description, then Orca will complain and ignore the
line. Additionally, Orca will not record the data from this
line in the RRD files and no data will be plotted.
If this happens when using orcallator.se, then it will happen
for orcallator.se versions 1.28b6 or older when hardware, mount
points, network interfaces, etc. are added or removed from the
system and orcallator.se outputs a different number of columns.
To work around this problem, upgrade to version 1.32 or later
of orcallator.se and orcallator.cfg at
http://www.orcaware.com/orca/pub/
which now create new output data file any time the number of
columns or a name of a column changes so that Orca will not
complain and all of your data will be plotted.
2.2) Warning: file '../orcallator/.../temp-percol-2001-02-22' was
current and now is not.
First, Orca considers a file to be current if the file's last
modified time is within 'late_interval' seconds of the current
time. In other words, Orca checks if a process is modifying
the file to keep it current. The 'late_interval' value is
determined by the configuration file or set to the 'interval'
value if the configuration file does not set 'late_interval'.
Orca stat()s the file when it first looks for files using the
'find_files' and determines that the file is current. Any time
after that Orca reads the file, it stat()s the file again and
determines if it is current. If there was a previous stat()
and the file was current followed by another stat() and the
file is not current, then the message is printed.
The appearance of this message means that the process that has
been updating the file has stopped updating it and this may be
worth looking into.
This message is also seen when the data gathering program,
e.g. orcallator.se, opens a new log file at the end of a day
and the old log file is no longer updated. Orca tries to
manage this situation when a file is no longer updated at the
end of a day.
If the actual measurement interval is not consistent with
Orca's configuration file "interval" 300 seconds, then Orca's
"interval" should be modified to match the actual measurement
interval. If you need to do this, then delete the RRD files
because they will keep the old "interval" and the input data
will need to be reloaded into new RRD files.
Increasing the "late_interval" may also remove this error.
2.3) Warning: cannot create Orca::HTMLFile object: cannot open
'/home/orca_html/o_host1-monthly.html.htm' for writing: Too
many open files.
This obviously happens with Orca runs out of open file
descriptors. Orca opens many file descriptors to do its work
and it doesn't like to close them unless it needs to.
The first thing to check is the maximum number of file
descriptors each process can have. On some systems, the login
shell scripts lower the maximum number of open file descriptors
a process may have.
To check this in a Csh shell variant (csh, tcsh), then type
limit descriptors
or for Bourne shell variant (sh, bash), then type
ulimit -n
On all operating systems Orca should be able to use 256 file
descriptors. On some, such as Linux, Orca can open 1024 files
at once. If the number you are getting is less than 256, then
raise this limit. Some operating systems let you raise the
limit. such as Solaris, while others do not, such as Linux. To
try to raise the limit, do
limit descriptors 1024
or
ulimit -n 1024
If these commands do not work, ask your system administrator
how to do this.
There is a bug in Orca's older than 0.27b2 where Orca would not
close a pipe file descriptor that is uncompressing a compressed
percol-* file to Orca. If your percol-* files are compressed,
then try either upgrading to 0.27b2 or later or apply the patch
http://www.orcaware.com/orca/pub/patches/orca-0.26-defunct-processes-patch.txt
to Orca 0.26. This should have Orca reduce its file descriptor
count.
2.4) Warning: file '.../orcallator/host1/orcallator-2001-11-06-000'
did exist and is now gone.
Orca prints this message when it found an input data file to
read and when it goes to read it, which may be a while later,
the file no longer exists.
When Orca is being used with orcallator.se, this message may
occur when orcallator.se compresses the previous day's percol-*
or orcallator-* file that Orca found.
Solaris/Orcallator.se
---------------------
3.1) What do I do about the error "/opt/RICHPse/bin/se: Unsupported
platform: sparcv9 SunOS 5.8"?
There are two solutions. SE 3.2 is now available and you can
upgrade to this version. It is available at:
http://www.setoolkit.com/
If you are using an SE version less than 3.2, then do this:
cd /opt/RICHPse/bin
ln se.sparc.5.7 se.sparc.5.8
ln se.sparcv9.5.7 se.sparcv9.5.8
3.2) Why does my background orcallator.se process die, in particular
when I log out of the system that I started it on?
This sounds like orcallator.se is started under the Bourne
shell, which kills background processes unless the process is
started with nohup:
nohup se orcallator.se &
3.3) Why is the page scan rate is always zero even though sar says
it is not?
It has been observed that on Solaris 2.5.1 the page scan rate
is always zero. This occurs with orcallator.se version 1.23 or
older. The problem is that on older versions of SE the
p_vmstat.scan variable is an integer and orcallator.se was
assuming a double. The fix is to upgrade to orcallator.se 1.24
or newer.
3.4) Why are all the NFS server statistics zero?
On Solaris 2.6 there is a bug in the SE toolkit that prevents
orcallator.se from getting the NFS server statistics. To fix
this, edit your RICHPse/include/kstat.se file and change #ifdef
to #if near the top of the file where it says
#ifdef MINOR_VERSION >= 70
# define rfs_counter_t uint64_t
#else
to
#if MINOR_VERSION >= 70
# define rfs_counter_t uint64_t
#else
3.5) Why are my Ethernet bits/second measurements all zero?
On 2.5.1 or older Solaris operating system releases, the kernel
does not measure the bits/second going through a particular
device. I believe some later kernel and device driver patches
may fix this, but Solaris 2.6 and greater definitely does
measure this.
3.6) Why do I get the error "Fatal: member: txunderruns vanished!:
Near line 201"?
This problem occurs with SE 2.5.0.2 and FDDI 5.0
interfaces. There are three possible fixes:
1) Upgrade to SE 3.0.
2) Visit the SE 2.5.0.2 download page and get the FDDI patch.
3) Add the latest patch to FDDI which should reinstate the
metric that went missing.
3.7) Why do I get the error "Fatal: member: txunderruns0 vanished!:
Near line 255"?
This problem occurs with Solaris 2.5 and hme interfaces. There
are three possible fixes:
1) Upgrade to a later Solaris release.
2) Get the hme patch for Solaris 2.5.
3) As a temporary work-around, change the member "defer" to
"missing1" in the ks_hme_network structure in
/opt/RICHPse/include/kstat.se like this:
#ifdef MINOR_VERSION >= 51
ulong defer;
#else
ulong missing1;
#endif
The next build of SE 3.0 will figure this out
automatically.
3.8) Why do I get the error "Fatal: member: framming vanished!: Near
line 160"?
This problem occurs with Solaris 2.5.1 and le interfaces. The
le patch for Solaris 2.5.1 corrects the spelling from framming
to framing. SE 3.0 tries to detect this patch, but if you
don't have the patch directory in /var/sadm/patch it can't tell
that the patch is installed. There are three fixes.
1) Upgrade to Solaris 2.6.
2) Reinstall the le patch for Solaris 2.5.1.
3) Create the directory /var/sadm/patch/103903-03 by hand.
4) As a temporary work-around, run scripts using se
-DLE_PATCH script.se to force the update or edit
start_orcallator.se and enable the appropriate SE_PATCHES
variable.
3.9) Why do I get the error "Fatal: member: drop vanished!: Near
line 285"?
This has been observed on Solaris 2.6 using SE version 3.0.
Upgrade to the latest version of SE.
3.10) Why do I get the error "Fatal: member: XXXXX vanished!: Near
line YYY"?
This happens when Sun changes some of the kernel names for
particular variables that the SE package is expecting to find.
For example, one error was that
SE has some conditional compile flags that tell it to look for
the new name. Try adding these to the command line options for
SE or editing start_orcallator.sh to enable some of the
defines.
1) -DLE_PATCH=1
2) -DHME_PATCH=1
3) -DHME_PATCH_IFSPEED=1
4) -DMULTICAST_PATCH=1
5) -DROBUST_LISTENQ=1
If a particular define does not fix the problem, then don't run
SE with it.
3.11) What do I do when I get the message "Fatal: subscript: 2 out of
range for: GLOBAL_net[2]: Near line 178"?
Try adding the following string "-DMAX_IF=XYZ" where XYZ is
larger than the total number of interfaces you have on the
system to se's command line.
For example, if you had 9 separate interfaces on the system,
then you could edit start_orcallator like this:
--- start_orcallator.sh.in.FCS Sat Oct 28 15:04:33 2000
+++ start_orcallator.sh.in Thu May 24 08:55:39 2001
@@ -31,6 +31,7 @@
#SE_PATCHES="$SE_PATCHES -DLE_PATCH"
#SE_PATCHES="$SE_PATCHES -DHME_PATCH"
#SE_PATCHES="$SE_PATCHES -DHME_PATCH_IFSPEED"
+SE_PATCHES="$SE_PATCHES -DMAX_IF=10"
# Check if the SE executable was found upon configure.
if test -z "$SE"; then
3.12) Why don't I get plots for my Veritas filesystems?
Version 1.13 or later of orcallator.se should be able to
generate filesystem statistics for Veritas filesystems. The
first thing to try is to upgrade to the latest SE and
orcallator.se release.
3.13) Why don't I get plots for my RSM filesystems?
I don't think orcallator.se records data from RSM disks.
However, to make sure, can you look through the first line of
the output orcallator files and see if the RSM filesystems
are listed there? Look for the disk_runp_ string. If you see
the filesystems there, then the Orca configuration file needs
to be updated to plot this data. In this case, email me the
names of the filesystems as listed in the output file and the
Orca configuration file can be updated to plot the data.
Otherwise, orcallator.se or the underlying SE header files will
need to be modified to find RSM filesystems. Since I don't
have access to a system with RSM on it, this is something
that somebody else will need to take on. Hint, hint...
3.14) Why don't I get any Interface Bits Per Second data for my qe
board?
Most likely you are either running Solaris 7 or older or
Solaris 8 with SE version 3.1 or older. You must run Solaris 8
to get qe bits per second data. Apply the following patches to
include/kstat.se and include/netif.se:
*** kstat.se.orig Fri Feb 9 01:44:14 2001
-- kstat.se Fri Feb 9 14:31:46 2001
***************
*** 646,651 ****
--- 646,661 ----
ulong_t no_tbufs;
ulong_t no_rbufs;
ulong_t rx_late_collisions;
+ #if MINOR_VERSION >= 70
+ ulong_t rbytes;
+ ulong_t obytes;
+ ulong_t multircv;
+ ulong_t multixmt;
+ ulong_t brdcstrcv;
+ ulong_t brdcstxmt;
+ ulong_t norcvbuf;
+ ulong_t noxmtbuf;
+ #endif
};
*** netif.se.orig Fri Feb 9 01:45:06 2001
--- netif.se Fri Feb 9 14:31:10 2001
***************
*** 229,236 ****
--- 229,241 ----
nocanput = if_qe.nocanput;
defer = if_qe.excess_defer;
nocarrier = if_qe.nocarrier;
+ #if MINOR_VERSION >= 70
+ ooctets = if_qe.obytes + if_qe.multixmt + if_qe.brdcstxmt;
+ ioctets = if_qe.rbytes + if_qe.multircv + if_qe.brdcstrcv;
+ #else
ooctets = 0;
ioctets = 0;
+ #endif
break;
case NETIF_BF:
kstat$bf.number$ = number$ - (if_max[NETIF_QE] + 1);
3.15) Orcallator.se core dumps.
Some of the SE include files can cause core dumps if there are
oddities on the system. Apply the following patches:
--- diskinfo.se.orig Fri Jan 12 11:20:09 2001
+++ diskinfo.se Tue Feb 27 19:37:28 2001
@@ -197,7 +197,12 @@
points_at[n] = '\0';
// chop off the :a at the end
- strcpy(strrchr(points_at, ':'), "");
+ while (--n >= 0) {
+ if (points_at[n] == ':') {
+ points_at[n] = '\0';
+ break;
+ }
+ }
// hack off ../../devices from the start
sscanf(points_at, "../../devices%s", &points_at);
--- mnt_class.se.orig Fri Jan 12 11:20:09 2001
+++ mnt_class.se Tue Feb 27 19:53:34 2001
@@ -96,7 +96,12 @@
number$ = -1;
return;
}
- strcpy(strchr(buf, '\n'), "");
+ for (i=0; i<sizeof(buf); ++i) {
+ if (buf[i] == '\n') {
+ buf[i] = '\0';
+ break;
+ }
+ }
i = 0;
for(p=strtok(buf, "\t"); p != nil; p=strtok(nil, "\t")) {
switch(i) {
3.16) Why should I keep my compressed percol-* or orcallator-* files?
There are several reasons to keep the data files:
1) If the RRD files get screwed up, you'll be able to
regenerate them.
2) If Orca ever needs to change the internal format of the RRD
files, it'll need to regenerate them.
3) If you ever want to look at older data in the RRD files, the
older data has less resolution. For example, if you want to
look at data, say 6 months old, in the RRD files, it is
averaged over a whole day. You won't be able to get the 5
minute data generated by orcallator.se.
Here's the resolution of the data in the RRD files (here
RRA is Round Robin Archive and there can be many in one RRD
file) as defined in lib/Orca/Constants.pm
# The first RRA is every 5 minutes for 200 hours, the second
# is every 30 minutes for 31 days, the third is every 2
# hours for 100 days, and the last is every day for 3 years.
3.17) Why do my Orca plots no longer contain any data after I change
anything related to orcallator, such as the subsystems to
measure, or when something changes on the system, such as mount
points, ethernet devices, etc?
Orca's input data files must contain the exact number of
columns either specified in the orcallator.cfg with the
column_description listing the actual column names or the
column names as specified in the first line of the input data
file when column_description is set to 'first_line'. If the
number of columns do not match, then Orca ignores this data to
protect the RRD files from incorrect data added to them.
The columns in the Orca's input files commonly happen with
orcallator.se when the administrator tells orcallator to
measure a different set of subsystems or when the system itself
adds or removed mount points, disk drives, ethernet devices,
etc.
To work around this problem, upgrade to the latest version of
Orca, orcallator.se and orcallator.cfg which now create a new
data file anytime the number of columns changes. The file
format of the output files has changed from
orcallator-YYYY-MM-DD to orcallator-YYYY-MM-DD-XXX where XXX is
a monotonically increasing number that resets to 0 at the
beginning of the next day.