Set fault beacon on drive failure #2375

behlendorf · 2014-06-09T19:46:16Z

For JBOD style configurations it's desirable to set the device's fault beacon on drive failures. This can be done either through proc interface or through the sg_ses utilities. The relevant zed scripts (cmd/zed/zed.d/io-spare.sh), should be updated to take advantage of this functionality. The tricky bit is to map the device name to an enclosure and slot location.

# Set the fault beacon through proc
$ echo 1 >/sys/class/enclosure/6:0:9:0/000/fault

# Set the fault beacon with sg_ses
$ sg_ses --dev-slot-num=0 --set=ident /dev/sg3
$ sg_ses --dev-slot-num=0 --clear=ident /dev/sg3

The text was updated successfully, but these errors were encountered:

FransUrbo · 2014-06-09T19:57:36Z

udevadm info -q all -p /sys/block/sda can be used to get information about the device, including /sys/device/.... paths. Don't know if it will give out enclosure/slot though.

Have a look at https://raw.githubusercontent.com/FransUrbo/scripts/master/GetDiskInfo.sh (it's huge and kludgy and I'm trying to rewrite it in perl which would make it a little cleaner, but ... :). I do a lot of stuff like that...

I'd love to help (should be able to whip something up quite quickly - I'm bored :), but I don't have something like a enclosure.

behlendorf · 2014-06-10T17:58:06Z

@FransUrbo That looks like a handy script!

I believe @nedbass figured out how to map the device to an enclosure and slot. Ned could you add the proceedure to this issue so we don't loose track of it. Then we can work it in to the scripts where appropriate.

chrisrd · 2014-06-11T00:27:46Z

Here's something we use to blink our lights, including the mapping of device to enclosure and slot:

#!/bin/bash
#
# Usage: disk-blink [--off] /dev/sd???
#
# ACHTUNG!
# ALLES TURISTEN UND NONTEKNISCHEN LOOKENPEEPERS!
# DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN!
# ODERWISE IST EASY TO SCHNAPPEN DER SPRINGENWERK, BLOWENFUSEN UND POPPENCORKEN
# MIT SPITZENSPARKEN. IST NICHT FÜR GEWERKEN BEI DUMMKOPFEN. DER RUBBERNECKEN
# SIGHTSEEREN KEEPEN DAS COTTONPICKEN HÄNDER IN DAS POCKETS MUSS. ZO RELAXEN
# UND WATSCHEN DER BLINKENLICHTEN.
#
function usage
{
        cat <<END

Usage: $0 [--off] /dev/sd??

END
        exit 1
}

set -e -u

action=--set=locate

dev=$1
[ "${dev}" = --off ] && { action=--clear=locate; dev=$2; }
[ -b "${dev}" ] || { echo 1>&2 "${dev}: not a block device"; exit 1; }

sasaddr=$(
        lsscsi -tg | 
        sed -rn 's/.*sas:(0x[[:xdigit:]]+).*'"${dev//\//\\/}"'[[:space:]].*/\1/ p'
)       
[ "${sasaddr}" ] || { echo "${dev}: SAS address not found"; exit 1; }

#
# Scan all the enclosures for our SAS address
#
for encldev in $(lsscsi -tg | awk '$2 == "enclosu" { print $5 }')
do
        #
        # Note: we discard errors from sg_ses as, at version 1.64 20120118,
        # it prints an error on some enclosures like:
        #
        #  $ sg_ses -j /dev/sg45 > /dev/null
        #  join_work: oi=6, ei=255 (broken_ei=0) not in join_arr
        #
        # See Also: http://thread.gmane.org/gmane.linux.scsi/81514
        #
        slot=$(
                sg_ses -j "${encldev}" 2> /dev/null | 
                egrep "^Slot |^\s+SAS address:" | 
                grep -B1 ${sasaddr} | 
                awk '/^Slot/ { print $2 }'
        )       
        [ "${slot}" ] && break
done
[ "${slot}" ] || { echo 2>&1 "${dev}: enclosure/slot not found"; exit 1; }

#
# Light 'em up
#
sg_ses -D "Slot ${slot}" "${action}" "${encldev}"

exit 0

openzfs/zfs#2375 (comment)

behlendorf · 2014-06-11T16:35:26Z

@chrisrd Nice! Thanks for posting this.

dasjoe · 2014-07-31T14:46:49Z

I've got some enclosures with (LSI) SAS expanders, which are visible in /sys/class/enclosure/.
This makes Slot 01's fault LED light up:
echo 1 > /sys/class/enclosure/1\:0\:21\:0/Slot\ 01/fault
"locate" makes it blink:
echo 1 > /sys/class/enclosure/1\:0\:21\:0/Slot\ 01/locate

cvoltz · 2016-07-25T18:45:43Z

I'm working on implementing this feature. identify_failed_drive.feature.txt provides a detailed feature description.

rlaager · 2016-07-25T23:25:35Z

@cvoltz Your description looks solid, except that I disagree that the UID light should be on. I would think only the fault light should be controlled. What is the advantage of adjusting both lights in lock-step? I think the UID should be left for administrator use.

joehandzik · 2016-07-26T15:50:36Z

@rlaager The UID could certainly be dropped, but there is potential value in a large configuration. With the UID + disk FAULT LED, customers know which chassis AND which disk a bit more easily.

tonyhutter · 2016-07-26T23:10:45Z

@cvoltz I'm working on pretty much the same thing at LLNL. Have you had any luck with using libstoragemgmt to blink the LEDs? Have you tried it for multipath devices as well?

cvoltz · 2016-07-28T14:32:52Z

I updated the feature description to remove the references to the UID lights on the drive.

behlendorf · 2016-10-24T18:04:47Z

For anyone following this issue you may want to checkout the latest master source which now has improved infrastructure for generically managing a devices SES LEDs, 1bbd877. The zedlet's environment will now contain a ZEVENT_VDEV_ENC_SYSFS_PATH variable when the SES sysfs path can be determined. This can be used to easily control the LEDs without any additional dependencies beyond the ses.ko kernel module. See the statechange-led.sh zedlet as an example.

This infrastructure is still being worked on but any feedback or testing on a wider range of configurations and hardware would be welcome.

- Fix autoreplace behaviour on statechange-led.sh script. ZED sends the following events on an auto-replace: 1. statechange: Disk goes UNAVAIL->ONLINE 2. statechange: Disk goes ONLINE->UNAVAIL 3. vdev_attach: Disk goes ONLINE Events 1-2 happen when ZED first attempts to do an auto-online. When that fails, ZED then tries an auto-replace, generating the vdev_attach event in #3. In the previous code, statechange-led was only looking at the UNAVAIL->ONLINE transition to turn off the LED. It ignored the #2 ONLINE->UNAVAIL transition, assuming it was just the "old" VDEV going offline. This is problematic, as a drive can go from ONLINE->UNAVAIL when it's malfunctioning, and we don't want to ignore that. This new patch correctly turns on the fault LED every time a drive becomes UNAVAIL. It also monitors vdev_attach events to trigger turning off the LED when an auto-replaced disk comes online. - Remove unnecessary libdevmapper warning with --with-config=kernel This fixes an unnecessary libdevmapper warning when building --with-config=kernel. Kernel code does not use libdevmapper, so the warning is not needed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #2375 Closes #5312 Closes #5331

behlendorf added this to the 0.6.4 milestone Jun 9, 2014

behlendorf added the Feature label Jun 9, 2014

behlendorf mentioned this issue Jun 9, 2014

Enclosure Management Integration #10

Closed

behlendorf added the zed label Jun 9, 2014

FransUrbo added a commit to FransUrbo/scripts that referenced this issue Jun 11, 2014

Script to blink lights on a HD enclosure by chrisrd found at

dc90c0e

openzfs/zfs#2375 (comment)

behlendorf modified the milestones: 0.6.5, 0.6.4 Feb 6, 2015

behlendorf modified the milestones: 0.7.0, 0.6.5 Jul 16, 2015

behlendorf closed this as completed in 1bbd877 Oct 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set fault beacon on drive failure #2375

Set fault beacon on drive failure #2375

behlendorf commented Jun 9, 2014

FransUrbo commented Jun 9, 2014

behlendorf commented Jun 10, 2014

chrisrd commented Jun 11, 2014

behlendorf commented Jun 11, 2014

dasjoe commented Jul 31, 2014

cvoltz commented Jul 25, 2016 •

edited

Loading

rlaager commented Jul 25, 2016

joehandzik commented Jul 26, 2016

tonyhutter commented Jul 26, 2016

cvoltz commented Jul 28, 2016 •

edited

Loading

behlendorf commented Oct 24, 2016

Set fault beacon on drive failure #2375

Set fault beacon on drive failure #2375

Comments

behlendorf commented Jun 9, 2014

FransUrbo commented Jun 9, 2014

behlendorf commented Jun 10, 2014

chrisrd commented Jun 11, 2014

behlendorf commented Jun 11, 2014

dasjoe commented Jul 31, 2014

cvoltz commented Jul 25, 2016 • edited Loading

rlaager commented Jul 25, 2016

joehandzik commented Jul 26, 2016

tonyhutter commented Jul 26, 2016

cvoltz commented Jul 28, 2016 • edited Loading

behlendorf commented Oct 24, 2016

cvoltz commented Jul 25, 2016 •

edited

Loading

cvoltz commented Jul 28, 2016 •

edited

Loading