Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed: Add zedlet to power off slot when drive is faulted #15200

Merged
merged 1 commit into from
Aug 24, 2023

Conversation

tonyhutter
Copy link
Contributor

Motivation and Context

Have ZED turn off power to a drive if the drive becomes faulted.

Description

If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then power off the drive's slot in the enclosure if it becomes FAULTED. This can help silence misbehaving drives. This assumes your drive enclosure fully supports slot power control via sysfs.

How Has This Been Tested?

Force faulted a drive with zpool offline -f <pool> <vdev>. Observed the drive power down.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@tonyhutter tonyhutter force-pushed the statechange-slot_off branch from 6da0182 to 5ed202c Compare August 23, 2023 16:48
@AllKind
Copy link
Contributor

AllKind commented Aug 23, 2023

for i in $(seq 1 20) ; do
	if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
		break
	fi
	sleep 0.1
done

if [ $i == 20 ] ; then
	exit 5
fi

If the slot is powered off on the 20th attempt, then the script will still exit with code 5.

@tonyhutter tonyhutter force-pushed the statechange-slot_off branch from 5ed202c to 093a127 Compare August 24, 2023 00:41
@tonyhutter
Copy link
Contributor Author

@AllKind good catch - I just fixed it in my latest push with:

diff --git a/cmd/zed/zed.d/statechange-slot_off.sh b/cmd/zed/zed.d/statechange-slot_off.sh
index 1a369a15a..327b3ae5d 100755
--- a/cmd/zed/zed.d/statechange-slot_off.sh
+++ b/cmd/zed/zed.d/statechange-slot_off.sh
@@ -54,7 +54,7 @@ for i in $(seq 1 20) ; do
        sleep 0.1
 done
 
-if [ $i == 20 ] ; then
+if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
        exit 5
 fi

@AllKind
Copy link
Contributor

AllKind commented Aug 24, 2023

+if [ "$(cat $ZEVENT_VDEV_ENC_SYSFS_PATH/power_status)" == "off" ] ; then
        exit 5
 fi

shouldn't that be != "off" ?

If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
@tonyhutter tonyhutter force-pushed the statechange-slot_off branch from 093a127 to 3ccee47 Compare August 24, 2023 16:12
@tonyhutter
Copy link
Contributor Author

@AllKind whoops, you're right. I fixed it in my latest push.

@behlendorf behlendorf merged commit 11fbcac into openzfs:master Aug 24, 2023
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Aug 25, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15200
behlendorf pushed a commit that referenced this pull request Aug 25, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
behlendorf added a commit to behlendorf/zfs that referenced this pull request Aug 25, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this pull request Aug 25, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often neeeded for the script to operate reliably.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this pull request Aug 26, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often neeeded for the script to operate reliably.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit that referenced this pull request Aug 26, 2023
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
behlendorf added a commit to behlendorf/zfs that referenced this pull request Aug 26, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15210
behlendorf added a commit that referenced this pull request Aug 27, 2023
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Sep 11, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15200
behlendorf added a commit to behlendorf/zfs that referenced this pull request Sep 11, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15210
tonyhutter added a commit that referenced this pull request Sep 12, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
tonyhutter pushed a commit that referenced this pull request Sep 12, 2023
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
defaziogiancarlo pushed a commit to LLNL/zfs that referenced this pull request Sep 13, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15200
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
(cherry-picked from commit 509212f)
defaziogiancarlo pushed a commit to LLNL/zfs that referenced this pull request Sep 13, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15210
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
(cherry-picked from commit 1e5cc95)
tonyhutter added a commit that referenced this pull request Sep 27, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes #15200
tonyhutter pushed a commit that referenced this pull request Sep 27, 2023
The statechange-slot_off.sh zedlet which was added in #15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #15210
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
If ZED_POWER_OFF_ENCLOUSRE_SLOT_ON_FAULT is enabled in zed.rc, then
power off the drive's slot in the enclosure if it becomes FAULTED.
This can help silence misbehaving drives.  This assumes your drive
enclosure fully supports slot power control via sysfs.

Reviewed-by: @AllKind
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes openzfs#15200
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
The statechange-slot_off.sh zedlet which was added in openzfs#15200
needed to be installed so it's included by the packages.

Additional testing has also shown that multiple retries are
often needed for the script to operate reliably.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#15210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants