-
-
Notifications
You must be signed in to change notification settings - Fork 368
/
Copy pathconfig-notes.txt
1083 lines (799 loc) · 40.2 KB
/
config-notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Configuration notes
===================
This chapter describes most of the configuration and use aspects of NUT,
including establishing communication with the device and configuring safe
shutdowns when the UPS battery runs out of power.
There are many programs and <<Features,features>> in this
package. You should check out the <<Overview,NUT Overview>>
and other accompanying documentation to see how it all works.
[NOTE]
======
NUT does not currently provide proper graphical configuration tools.
However, there is now support for linkdoc:developer-guide[Augeas,augeas_user],
which will enable the easier creation of configuration tools.
The linkman:nutconf[8] tool should also help with programmatic manipulation
of various NUT configuration files.
Moreover, linkman:nut-scanner[8] is available to discover supported devices
(USB, SNMP, Eaton XML/HTTP and IPMI) and NUT servers (using Avahi or the
classic connection method).
======
Details about the configuration files
-------------------------------------
Generalities
~~~~~~~~~~~~
All configuration files within this package are parsed with a common
state machine, which means they all can use a number of extras described here.
First, most of the programs use an upper-case word to declare a
configuration directive. This may be something like MONITOR, NOTIFYCMD,
or ACCESS. The case does matter here. "monitor" won't be recognized.
Next, the parser does not care about whitespace between words. If you
like to indent things with tabs or spaces, feel free to do it here.
If you need to set a value to something containing spaces, it has to be
contained within "quotes" to keep the parser from splitting up the line.
That is, you want to use something like this:
SHUTDOWNCMD "/sbin/shutdown -h +0"
Without the quotes, it would only see the first word on the line.
OK, so let's say you really need to embed that kind of quote within your
configuration directive for some reason. You can do that too.
NOTIFYCMD "/bin/notifyme -foo -bar \"hi there\" -baz"
In other words, `\` can be used to escape the `"`.
Finally, for the situation where you need to put the `\` character into your
string, you just escape it.
NOTIFYCMD "/bin/notifyme c:\\dos\\style\\path"
The `\` can actually be used to escape any character, but you only really
need it for `\`, `"`, and `#` as they have special meanings to the parser.
When using file names with space characters, you may end up having tricky
things since you need to write them inside `""` which must be escaped:
NOTIFYCMD "\"c:\\path with space\\notifyme\" \"c:\\path with space\\name\""
`#` is the comment character. Anything after an unescaped `#` is ignored.
Something like this...
identity = my#1ups
will actually turn into `identity = my`, since the `#` stops the
parsing. If you really need to have a `#` in your configuration, then
escape it.
identity = my\#1ups
Much better.
The `=` character should be used with care too. There should be only one
"simple" `=` character in a line: between the parameter name and its value.
All other `=` characters should be either escaped or within "quotes".
password = 123=123
is incorrect. You should use:
password = 123\=123
or:
password = "123=123"
Line spanning
~~~~~~~~~~~~~
You can put a backslash at the end of the line to join it to the next
one. This creates one virtual line that is composed of more than one
physical line.
Also, if you leave the `""` quote container open before a newline, it will
keep scanning until it reaches another one. If you see bizarre behavior
in your configuration files, check for an unintentional instance of
quotes spanning multiple lines.
Basic configuration
-------------------
This chapter describes the base configuration to establish communication with
the device.
This will be sufficient for PDU. But for UPS and SCD, you will also need to
configure <<UPS_shutdown,automatic shutdowns for low battery events>>.
image:images/simple.png[]
On operating systems with service management frameworks (such as Linux
systemd and Solaris/illumos SMF), the life-cycle of driver, data server
and monitoring client daemons is managed respectively by `nut-driver`
(multi-instance service), `nut-server` and `nut-monitor` services.
These are in turn wrapped by an "umbrella" service (or systemd "target")
conveniently called `nut` which allows to easily start or stop all those
of the bundled services, which are enabled on a particular deployment.
[[Driver_configuration]]
Driver configuration
~~~~~~~~~~~~~~~~~~~~
Create one section per UPS in linkman:ups.conf[5] file.
NOTE: The default path for a source installation is `/usr/local/ups/etc`,
while packaged installation will vary.
For example, `/etc/nut` is used on Debian and derivatives,
while `/etc/ups` or `/etc/upsd` is used on RedHat and derivatives.
To find out which driver to use, check the
<<HCL,Hardware Compatibility List>>,
or `data/driver.list(.in)` source file.
Once you have picked a driver, create a section for your UPS in
'ups.conf'. You must supply values at least for "driver" and "port".
Some drivers may require other flags or settings. The "desc" value
is optional, but is recommended to provide a better description of
what useful load your UPS is feeding.
A typical device without any extra settings looks like this:
[mydevice]
driver = mydriver
port = /dev/ttyS1
desc = "Workstation"
[NOTE]
======
USB drivers (such as `usbhid-ups` for non-SHUT mode, `nutdrv_qx` for
non-serial mode, `bcmxcp_usb`, `tripplite_usb`, `blazer_usb`, `riello_usb`
and `richcomm_usb`) are special cases and ignore the 'port' value.
You must still set this value, but it does not matter what you set
it to; a common and good practice is to set 'port' to *auto*, but you
can put whatever you like.
If you only own one USB UPS, the driver will find it automatically
if it matches the identifiers are built into that driver.
If you own more than one, refer to the driver's manual page for more
information on matching a specific device, or trying specific subdriver
or protocol options with a currently unknown device.
======
NOTE: On Windows systems, the second serial port (`COM2`), equivalent to
`/dev/ttyS1` on Linux, would be `\\\\.\\COM2`.
References: linkman:ups.conf[5],
linkman:nutupsdrv[8],
linkman:bcmxcp_usb[8],
linkman:blazer_usb[8],
linkman:nutdrv_qx[8],
linkman:richcomm_usb[8],
linkman:riello_usb[8],
linkman:tripplite_usb[8],
linkman:usbhid-ups[8]
[[Starting_drivers]]
Starting the driver(s) in legacy operating systems
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generally, you can just start the driver(s) for your hardware (all sections
defined in 'ups.conf') using the following command:
:; upsdrvctl start
Make sure the driver doesn't report any errors. It should show a
few details about the hardware and then enter the background. You
should get back to the command prompt a few seconds later. For
reference, a successful start of the `usbhid-ups` driver looks like this:
# upsdrvctl start
Network UPS Tools - Generic HID driver 0.34 (2.4.1)
USB communication driver 0.31
Using subdriver: MGE HID 1.12
Detected EATON - Ellipse MAX 1100 [ADKK22008]
If the driver doesn't start cleanly, make sure you have picked the
right one for your hardware. You might need to try other drivers
by changing the "driver=" value in 'ups.conf'.
Be sure to check the driver's man page to see if it needs any extra
settings in 'ups.conf' to detect your hardware.
If it says `can't bind /var/state/ups/...` or similar, then your
state path probably isn't writable by the driver. Check the
<<StatePath,permissions and mode on that directory>> vs. the
user account your driver starts as.
After making changes, try the <<Ownership, Ownership and permissions>>
step again.
[[Starting_drivers_NDE]]
Driver(s) as a service
~~~~~~~~~~~~~~~~~~~~~~
On operating systems with init-scripts managing life-cycle of the operating
environment, the `upsdrvctl` program is also commonly used in those scripts.
It has a few downsides, such as that if the device was not accessible during
OS startup and the driver connection timed out, it would remain not-started
until an administrator (or some other script) "kicks" the driver to retry
startup. Also, startup of the `upsd` data server daemon and its clients
like `upsmon` is delayed until all the NUT drivers complete their startup
(or time out trying).
This can be a big issue on systems which monitor multiple devices, such as
big servers with multiple power sources, or administrative workstations
which monitor a datacenter full of UPSes.
For this reason, NUT starting with version 2.8.0 supports startup of its
drivers as independent instances of a `nut-driver` service under the Linux
systemd and Solaris/illumos SMF service-management frameworks (corresponding
files and scripts may be not pre-installed in packaging for other systems).
Such service instances have their own and independent life-cycle, including
parallel driver start and stop processing, and retries of startup in case of
failure as implemented by the service framework in the OS. The Linux systemd
solution also includes a `nut-driver.target` as a checkpoint that all defined
drivers have indeed started up (as well as being a singular way to enable or
disable startup of drivers).
In both cases, a service named `nut-driver-enumerator` is registered, and
when it is (re-)started it scans the currently defined device sections in
'ups.conf' and the currently defined instances of `nut-driver` service,
and brings them in sync (adding or removing service instances), and if
there were changes -- it restarts the corresponding drivers (via service
instances) as well as the data server which only reads the list of sections
at its startup. This helper service should be triggered whenever your system
(re-)starts the `nut-server` service, so that it runs against an up-to-date
list of NUT driver processes.
Two service bundles are provided for this feature: a set of
`nut-driver-enumerator-daemon*` units starts the script as a daemon
to regularly inspect and apply the NUT configuration to OS service unit
wrappings (mainly intended for monitoring systems with a dynamic set of
monitored power devices, or for systems where filesystem events monitoring
is not a clockwork-reliable mechanism to 100% rely on); while the other
`nut-driver-enumerator.*` units run the script once per triggering of
the service (usually during boot-up; configuration file changes can be
detected and propagated by systemd most of the time, but not by SMF out
of the box).
A service-oriented solution also allows to consider that different drivers
have different dependencies -- such as that networked drivers should begin
startup after IP addresses have been assigned, while directly-connected
devices might need nothing beside a mounted filesystem (or an activated
USB stack service or device rule, in case of Linux). Likewise, systems
administrators can define further local dependencies between services and
their instances as needed on particular deployments.
This solution also adds the `upsdrvsvcctl` script to manage NUT drivers as
system service instances, whose CLI mimics that of `upsdrvctl` program.
One addition is the `resync` argument to trigger `nut-driver-enumerator`,
another is a `list` argument to display current mappings of service
instances to NUT driver sections. Also, original tool's arguments such
as the `-u` (user to run the driver as) or `-D` (debug of the driver)
do not make sense in the service context -- the accounts to use and
other arguments to the driver process are part of service setup (and
an administrator can manage it there).
Note that while this solution tries to register service instances with same
names as NUT configuration sections for the devices, this can not always be
possible due to constraints such as syntax supported by a particular service
management framework. In this case, the enumerator falls back to MD5 hashes
of such section names, and the `upsdrvsvcctl` script supports this to map
the user-friendly NUT configuration section names to actual service names
that it would manage.
References: man pages: linkman:nutupsdrv[8], linkman:upsdrvctl[8],
linkman:upsdrvsvcctl[8]
Data server configuration (upsd)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Configure `upsd`, which serves data from the drivers to the clients.
First, edit 'upsd.conf' to allow access to your client systems. By
default, `upsd` will only listen to `localhost` port 3493/tcp. If you want
to connect to it from other machines, you must specify each interface you
want `upsd` to listen on for connections, optionally with a port number.
LISTEN 127.0.0.1 3493
LISTEN ::1 3493
As a special case, `LISTEN * <port>` (with an asterisk) will try to
listen on "ANY" IP address for both and IPv6 (`::0`) and IPv4 (`0.0.0.0`),
subject to `upsd` command-line arguments, or system configuration or support.
Note that if the system supports IPv4-mapped IPv6 addressing per RFC-3493,
and does not allow to disable this mode, then there may be one listening
socket to handle both address families.
NOTE: Refer to the NUT user manual <<NUT_Security,security chapter>> for
information on how to access and secure upsd clients connections.
Next, create 'upsd.users'. For now, this can be an empty file.
You can come back and add more to it later when it's time to
configure `upsmon` or run one of the management tools.
Do not make either file world-readable, since they both hold
access control data and passwords. They just need to be readable by
the user you created in the preparation process.
The suggested configuration is to `chown` it to `root`, `chgrp` it to the
group you created, then make it readable by the group.
NOTE: If you installed NUT from source and used `make install-as-root`,
or if your distribution packaging did, the sample configuration files
would have the suggested ownership and permissions assigned, so if you
use e.g. `cp -pf upsd.users.sample upsd.users` (as `root`) to start out
with some annotated comments and adapt that to your deployment, the
copied files should also get the expected safe permissions.
:; chown root:nut upsd.conf upsd.users
:; chmod 0640 upsd.conf upsd.users
References: man pages: linkman:upsd.conf[5],
linkman:upsd.users[5],
linkman:upsd[8]
[[Starting_upsd]]
Starting the data server
~~~~~~~~~~~~~~~~~~~~~~~~
Start the network data server:
:; upsd
Make sure it is able to connect to the driver(s) on your system.
A successful run looks like this:
# upsd
Network UPS Tools upsd 2.4.1
listening on 127.0.0.1 port 3493
listening on ::1 port 3493
Connected to UPS [eaton]: usbhid-ups-eaton
`upsd` prints dots while it waits for the driver to respond. Your
system may print more or less depending on how many drivers you
have and how fast they are.
NOTE: If `upsd` says that it can't connect to a UPS or that the data
is stale, then your 'ups.conf' is not configured correctly, or you
have a driver that isn't working properly. You must fix this before
going on to the next step.
NOTE: Normally `upsd` requires that at least one driver section is
defined in the 'ups.conf' file, and refuses to start otherwise.
If you intentionally do not have any driver sections defined (yet)
but still want the data server to run, respond and report zero devices
(e.g. on an automatically managed monitoring deployment), you can enable
the `ALLOW_NO_DEVICE true` option in the 'upsd.conf' file.
NOTE: Normally `upsd` requires that at all `LISTEN` directives defined
in the 'upsd.conf' file are honoured (except for mishaps possible with
many names of `localhost`), and refuses to start otherwise. If you want
to allow start-up in cases where at least one but possibly not all of
the `LISTEN` directives were honoured, you can enable the
`ALLOW_NOT_ALL_LISTENERS true` option in the 'upsd.conf' file.
Note you would have to restart `upsd` to pick up the `LISTEN`ed IP address
if it appears later, so probably configuring `LISTEN *` is a better choice
in such cases.
On operating systems with service management frameworks, the data server
life-cycle is managed by `nut-server` service.
Reference: man page: linkman:upsd[8]
Check the UPS data
~~~~~~~~~~~~~~~~~~
Status data
^^^^^^^^^^^
Make sure that the UPS is providing good status data.
You can use the `upsc` command-line client for this:
:; upsc myupsname@localhost ups.status
You should see just one line in response:
OL
`OL` means your system is running on line power. If it says something
else (like `OB` -- on battery, or `LB` -- low battery), your driver was
probably misconfigured during the <<Driver_configuration, Driver configuration>>
step. If you reconfigure the driver, use `upsdrvctl stop` to stop it, then
start it again as shown in the <<Starting_drivers, Starting driver(s)>> step.
Reference: man page: linkman:upsc[8]
All data
^^^^^^^^
Look at all of the status data which is being monitored.
:; upsc myupsname@localhost
What happens now depends on the kind of device and driver you have.
In the list, you should see `ups.status` with the same value you got
above. A sample run on an UPS (Eaton Ellipse MAX 1100) looks like this:
battery.charge: 100
battery.charge.low: 20
battery.runtime: 2525
battery.type: PbAc
device.mfr: EATON
device.model: Ellipse MAX 1100
device.serial: ADKK22008
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.version: 2.4.1-1988:1990M
driver.version.data: MGE HID 1.12
driver.version.internal: 0.34
input.sensitivity: normal
input.transfer.boost.low: 185
input.transfer.high: 285
input.transfer.low: 165
input.transfer.trim.high: 265
input.voltage.extended: no
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.status: on
outlet.1.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.switchable: no
output.frequency.nominal: 50
output.voltage: 230.0
output.voltage.nominal: 230
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.firmware: 5102AH
ups.load: 0
ups.mfr: EATON
ups.model: Ellipse MAX 1100
ups.power.nominal: 1100
ups.productid: ffff
ups.serial: ADKK22008
ups.status: OL CHRG
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463
Reference: man page: linkman:upsc[8],
<<nut-names,NUT command and variable naming scheme>>
Startup scripts
~~~~~~~~~~~~~~~
NOTE: This step is not necessary if you installed from packages.
Edit your startup scripts, and make sure `upsdrvctl` and `upsd` are run
every time your system starts. In newer versions of NUT, you may have a
'nut.conf' file which sets the `MODE` variable for bundled init-scripts,
to facilitate enabling of certain features in the specific end-user
deployments.
If you installed from source, check the `scripts` directory for reference
init-scripts, as well as systemd or SMF service methods and manifests.
[[UPS_shutdown]]
Configuring automatic shutdowns for low battery events
------------------------------------------------------
The whole point of UPS software is to bring down the OS cleanly when you
run out of battery power. Everything else is roughly eye candy.
To make sure your system shuts down properly, you will need to perform some
additional configuration and run upsmon. Here are the basics.
[[Shutdown_design]]
Shutdown design
~~~~~~~~~~~~~~~
When your UPS batteries get low, the operating system needs to be brought
down cleanly. Also, the UPS load should be turned off so that all devices
that are attached to it are forcibly rebooted, and subsequently start in
the predictable order and state suitable for your data center.
Here are the steps that occur when a critical power event happens,
for the simpler case of one UPS device feeding one or several systems:
1. The UPS goes on battery
2. The UPS reaches low battery (a "critical" UPS), that is to say,
`upsc` displays:
+
ups.status: OB LB
+
The exact behavior depends on the specific device, and is related to
such settings and readings as:
- `battery.charge` and `battery.charge.low`
- `battery.runtime` and `battery.runtime.low`
3. The `upsmon` primary notices the "critical UPS" situation and sets
"FSD" -- the "forced shutdown" flag to tell all secondary systems
that it will soon power down the load.
+
[WARNING]
=========
By design, since we require power-cycling the load and don't
want some systems to be powered off while others remain running
if the "wall power" returns at the wrong moment as usual, the "FSD"
flag can not be removed from the data server unless its daemon is
restarted. If we do take the first step in critical mode, then we
intend to go all the way -- shut down all the servers gracefully,
and power down the UPS.
Keep in mind that some UPS devices and corresponding drivers would
latch the "FSD" again even if "wall power" is available, but the
remaining battery charge is below a threshold configured as "safe"
in the device (usually if you manually power on the UPS after a long
power outage). This is by design of respective UPS vendors, since
in such situation they can not guarantee that if a new power outage
happens, their UPS would safely shut down your systems again.
So it is deemed better and safer to stay dark until batteries
become sufficiently charged.
=========
+
(If you have no secondary systems, skip to step 6)
4. `upsmon` secondary systems see "FSD" and:
- generate a `NOTIFY_SHUTDOWN` event
- wait `FINALDELAY` seconds -- typically `5`
- call their `SHUTDOWNCMD`
- disconnect from `upsd`
5. The `upsmon` primary system waits up to `HOSTSYNC` seconds (typically `15`)
for the secondary systems to disconnect from `upsd`. If any are still
connected after this time, `upsmon` primary stops waiting and proceeds
with the shutdown process.
6. The `upsmon` primary:
- generates a `NOTIFY_SHUTDOWN` event
- waits `FINALDELAY` seconds -- typically `5`
- creates the `POWERDOWNFLAG` file in its local filesystem -- usually
`/etc/killpower`, or `/run/nut/killpower` in a temporary file system
- calls the `SHUTDOWNCMD`
7. On most systems, `init` takes over, kills your processes, syncs and
unmounts some filesystems, and remounts some read-only.
8. `init` then runs your shutdown script. This checks for the
`POWERDOWNFLAG`, finds it, and tells the UPS driver(s) to power off
the load by sending commands to the connected UPS device(s) they manage.
9. All the systems lose power.
10. Time passes. The power returns, and the UPS switches back on.
11. All systems reboot and go back to work.
///////////////////////////////////
https://github.com/networkupstools/nut/issues/1370
TODO: Check other docs and code to spell out expected behavior with
multiple UPS devices (when not all of them go critical or even on battery)
and servers with multiple inputs.
Does the `upsmon` primary system power-cycle a "critical" UPS if that
is not the only one feeding it, so it is not shutting down now?
///////////////////////////////////
How you set it up
~~~~~~~~~~~~~~~~~
[[NUT_user_creation]]
NUT user creation
^^^^^^^^^^^^^^^^^
Create a `upsd` user for `upsmon` to use while monitoring this UPS.
Edit 'upsd.users' and create a new section. The `upsmon` will connect
to `upsd` and use these user name (in brackets) and password to
authenticate (as specified in its configuration via `MONITOR` line).
This example is for defining a user called "monuser":
[monuser]
password = mypass
upsmon primary
# or upsmon secondary
References: linkman:upsd[8], linkman:upsd.users[5]
Reloading the data server
^^^^^^^^^^^^^^^^^^^^^^^^^
Reload `upsd`. Depending on your configuration, you may be able to
do this without stopping the `upsd` daemon process (if it had saved
a PID file earlier):
:; upsd -c reload
If that doesn't work (check the syslog), just restart it:
:; upsd -c stop
:; upsd
For systems with integrated service management (Linux systemd,
illumos/Solaris SMF) their corresponding `reload` or `refresh`
service actions should handle this as well. Note that such integration
generally forgoes saving of PID files, so `upsd -c <cmd>` would not work.
If your workflow requires to manage these daemons beside the OS provided
framework, you can customize it to start `upsd -FF` and save the PID file.
NUT releases after 2.8.0 define aliases for these units, so if your Linux
distribution uses NUT-provided unit definitions, `systemctl reload upsd`
may also work.
NOTE: If you want to make reloading work later, see the entry in the
link:FAQ.html[FAQ] about starting `upsd` as a different user.
Power Off flag file
^^^^^^^^^^^^^^^^^^^
Set the `POWERDOWNFLAG` location for `upsmon`.
In 'upsmon.conf', add a `POWERDOWNFLAG` directive with a filename.
The `upsmon` will create this file when the UPS needs to be powered off
during a power failure when low battery is reached.
We will test for the presence of this file in a later step.
POWERDOWNFLAG /etc/killpower
References: man pages: linkman:upsmon[8],
linkman:upsmon.conf[5]
Securing upsmon.conf
^^^^^^^^^^^^^^^^^^^^
The recommended setting is to have it owned by `root:nut`, then make it
readable by the group and not by the world. This file contains passwords
that could be used by an attacker to start a shutdown, so keep it secure.
NOTE: If you installed NUT from source and used `make install-as-root`,
or if your distribution packaging did, the sample configuration files
would have the suggested ownership and permissions assigned, so if you
use e.g. `cp -pf upsmon.conf.sample upsmon.conf` (as `root`) to start out
with some annotated comments and adapt that to your deployment, the
copied files should also get the expected safe permissions.
:; chown root:nut upsmon.conf
:; chmod 0640 upsmon.conf
This step has been placed early in the process so you secure this file
before adding sensitive data in the next step.
Create a MONITOR directive for upsmon
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Edit 'upsmon.conf' and create a `MONITOR` line with the UPS definition
(<upsname>@<hostname>), username and password from the
<<NUT_user_creation, NUT user creation>> step, and the
"primary" or "secondary" setting.
If this system is the UPS manager (i.e. it's connected to this UPS directly
and can manage it using a suitable NUT driver), its `upsmon` is the primary:
MONITOR myupsname@mybox 1 monuser mypass primary
If it's just monitoring this UPS over the network, and some other
system is the primary, then this one is a secondary:
MONITOR myupsname@mybox 1 monuser mypass secondary
The number `1` here is the "power value". This should always be set
to 1, unless you have a very special (read: expensive) system with
redundant power supplies. In such cases, refer to the User Manual:
- <<BigServers,typical setups for big servers>>,
- <<DataRoom,typical setups for data rooms>>.
Note that the "power value" may also be 0 for a monitoring (administrative)
system which only observes the remote UPS status but is not impacted by its
power events, and so does not shut down when the UPS does.
References: linkman:upsmon[8], linkman:upsmon.conf[5]
Define a SHUTDOWNCMD for upsmon
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Still in 'upsmon.conf', add a directive that tells `upsmon` how to
shut down your system. This example seems to work on most systems:
SHUTDOWNCMD "/sbin/shutdown -h +0"
Notice the presence of "quotes" here to keep it together.
If your system has special needs (e.g. system-provided shutdown handler
is ungracefully time constrained), you may want to set this to a script
which does customized local shutdown tasks before calling `init` or
`shutdown` programs to handle the system side of this operation.
Start upsmon
^^^^^^^^^^^^
:; upsmon
If it complains about something, then check your configuration.
On operating systems with service management frameworks, the monitoring client
life-cycle is managed by `nut-monitor` service.
Checking upsmon
^^^^^^^^^^^^^^^
Look for messages in the `syslog` to indicate success.
It should look something like this:
May 29 01:11:27 mybox upsmon[102]: Startup successful
May 29 01:11:28 mybox upsd[100]: Client monuser@192.168.50.1
logged into UPS [myupsname]
Any errors seen here are probably due to an error in the config files of either
`upsmon` or `upsd`. You should fix them before continuing.
Startup scripts
^^^^^^^^^^^^^^^
NOTE: This step is not need if you installed from packages.
Edit your startup scripts, and add a call to `upsmon`.
Make sure `upsmon` starts when your system comes up.
On systems with `upsmon` primary (also running the data server),
do it after `upsdrvctl` and `upsd`, or it will complain about not
being able to contact the server.
You may delete the `POWERDOWNFLAG` in the startup scripts, but it is not
necessary. `upsmon` will clear that file for you when it starts.
NOTE: Init script examples are provide in the 'scripts' directory of
the NUT source tree, and in the various <<_binary_packages,packages>>
that exist.
Shutdown scripts
^^^^^^^^^^^^^^^^
NOTE: This step is not need if you installed from packages.
Edit your shutdown scripts, and add `upsdrvctl shutdown`.
You should configure your system to power down the UPS after the
filesystems are remounted read-only. Have it look for the presence
of the `POWERDOWNFLAG` (from linkman:upsmon.conf[5]), using this
as an example:
------------------------------------------------------------------------------
if (/sbin/upsmon -K)
then
echo "Killing the power, bye!"
/sbin/upsdrvctl shutdown
sleep 120
# uh oh... the UPS power-off failed
# you probably want to reboot here so you don't get stuck!
# *** see also the section on power races in the FAQ! ***
fi
------------------------------------------------------------------------------
A more elaborate example can be found in NUT sources, e.g.:
https://github.com/networkupstools/nut/blob/master/scripts/systemd/nutshutdown.in
[WARNING]
==============================================================================
- Be careful that `upsdrvctl shutdown` command will probably power off
your machine and others fed by the UPS(es) which it manages.
Don't use it unless your system is ready to be halted by force.
If you run RAID, read the <<_raid_warning,RAID warning>> below!
- Make sure the filesystem(s) containing `upsdrvctl`, `upsmon`,
the `POWERDOWNFLAG` file, 'ups.conf' and your UPS driver(s) are
mounted (possibly in read-only mode) when the system gets to
this point. Otherwise it won't be able to figure out what to do.
- If for some reason you can not ensure `upsmon` program is executable
at this point, your script can `(test -f /etc/killpower)` in a somewhat
non-portable manner, instead of asking `upsmon -K` for the verdict
according to its current configuration.
==============================================================================
[[Testing_shutdowns]]
Testing shutdowns
^^^^^^^^^^^^^^^^^
UPS equipment varies from manufacturer to manufacturer and even within
model lines. You should test the <<Shutdown_design,shutdown sequence>>
on your systems before leaving them unattended. A successful sequence
is one where the OS halts before the battery runs out, and the system
restarts when power returns.
The first step is to see how `upsdrvctl` will behave without actually
turning off the power. To do so, use the `-t` argument:
:; upsdrvctl -t shutdown
It will display the sequence without actually calling the drivers.
You can finally test a forced shutdown sequence (FSD) using:
:; upsmon -c fsd
This will execute a full shutdown sequence, as presented in
<<Shutdown_design,Shutdown design>>, starting from the 3rd step.
If everything works correctly, the computer will be forcibly powered
off, may remain off for a few seconds to a few minutes (depending on
the driver and UPS type), then will power on again.
If your UPS just sits there and never resets the load, you are vulnerable
to a power race and should add the "reboot after timeout" hack at the very
least.
Also refer to the section on power races in the link:FAQ.html[FAQ].
Using suspend to disk
~~~~~~~~~~~~~~~~~~~~~
Support for suspend to RAM and suspend to disk has been available in
the Linux kernel for a while now. For obvious reasons, suspending to
RAM isn't particularly useful when the UPS battery is getting low,
but suspend to disk may be an interesting concept.
This approach minimizes the amount of disruption which would be caused
by an extended outage. The UPS goes on battery, then reaches low
battery, and the system takes a snapshot of itself and halts. Then it
is turned off and waits for the power to return.
Once the power is back, the system reboots, pulls the snapshot back in,
and keeps going from there. If the user happened to be away when it
happened, they may return and have no idea that their system actually
shut down completely in the middle (although network connections will drop).
In order for this to work, you need to shutdown NUT (UPS driver, `upsd`
server and `upsmon` client) in the `suspend` script and start them again in
the `resume` script. Don't try to keep them running. The `upsd` server
will latch the FSD state (so it won't be usable after resuming) and so
will the `upsmon` client. Some drivers may work after resuming, but many
don't and some UPS devices will require re-initialization, so it's best not
to keep them running either.
NOTE: Starting with NUT v2.8.3, there is some growing support for system-wide
sleep on some platforms (e.g. to catch the "going to sleep" event and make
a note of it in the daemons, to take the least-surprise corrective actions
after a significant change in system clock readings), but the warnings in
previous paragraph may still apply.
After stopping NUT driver, server and client you'll have to send the UPS
the command to shutdown only if the `POWERDOWNFLAG` is present. Note
that most likely you'll have to allow for a grace period after calling
`upsdrvctl shutdown` since the system will still have to take a
snapshot of itself after that. Not all drivers and devices support this,
so before going down this road, make sure that the one you're using does.
- see if you can query or configure settings named like `load.off.delay`,
`ups.delay.shutdown`, `offdelay` and/or `shutdown_delay`
RAID warning
~~~~~~~~~~~~
If you run any sort of RAID equipment, make sure your arrays are
either halted (if possible) or switched to "read-only" mode.
Otherwise you may suffer a long resync once the system comes back up.
The kernel may not ever run its final shutdown procedure, so you must take
care of all array shutdowns in userspace before `upsdrvctl shutdown` runs.
If you use software RAID (md) on Linux, get `mdadm` and try using
`mdadm --readonly` to put your arrays in a safe state. This has to
happen after your shutdown scripts have remounted the filesystems.
On hardware RAID or other kernels, you have to do some detective work. It may
be necessary to contact the vendor or the author of your driver to find out
how to put the array in a state where a power loss won't leave it "dirty".
Our understanding is that most if not all RAID devices on Linux will be fine
unless there are pending writes. Make sure your filesystems are remounted
read-only and you should be covered.
[[DataRoom]]
Typical setups for enterprise networks and data rooms
-----------------------------------------------------
The split nature of this UPS monitoring software allows a wide variety of
power connections. This chapter will help you identify how things should
be configured using some general descriptions.
There are two main elements:
1. There's a UPS attached to a communication (serial, USB or network) port
on this system.
2. This system depends on a UPS for power.
You can play "mix and match" with those two to arrive at these descriptions
for individual hosts:
- A: 1 but not 2
- B: 2 but not 1
- C: 1 and 2
A small to medium sized data room usually has one 'C' and a bunch of 'Bs'.
This means that there's a system (type 'C') hooked to the UPS which depends
on it for power. There are also some other systems in there (type 'B')
which depend on that same UPS for power, but aren't directly connected to
it communications-wise.
Larger data rooms or those with multiple UPSes may have several "clusters"
of the "single 'C', many 'Bs'" depending on how it's all wired.
Finally, there's a special case. Type 'A' systems are connected to
an UPS's communication port, but don't depend on it for power.
This usually happens when an UPS is physically close to a box and can
reach the serial port, but the power wiring is such that it doesn't
actually feed that box.
Once you identify a system's type, use this list to decide which of the
programs need to be run for monitoring:
- A: driver and `upsd`
- B: `upsmon` (in secondary mode)
- C: driver, `upsd`, and `upsmon` (in primary mode, as the UPS manager)
image:images/advanced.png[]
To further complicate things, you can have a system that is hooked to
multiple UPSes, but only depends on one for power. This particular
situation makes it an `A` relative to one UPS, and a `C` relative to the
other. The software can handle this -- you just have to tell it what to do.
NOTE: NUT can also serve as a data proxy to increase the number of clients,
or share the communication load between several `upsd` instances.
If you are running large server-class systems that have more than one
power feed, see the next section for information on how to handle it
properly.
[[BigServers]]
Typical setups for big servers with UPS redundancy
--------------------------------------------------
By using multiple `MONITOR` statements in 'upsmon.conf', you can configure
an environment where a large machine with redundant power monitors multiple
separate UPSes.
image:images/bigbox.png[]