Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sequential resilver drive failure race condition #14063

Merged
merged 1 commit into from
Oct 21, 2022

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Backport of #14050 for the 2.1.7 staging branch.

Description

This patch handles the race condition on simultaneous failure of 2 drives, which misses the vdev_rebuild_reset_wanted signal in vdev_rebuild_thread. We retry to catch this inside the vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf behlendorf1@llnl.gov
Reviewed-by: Richard Yao richard.yao@alumni.stonybrook.edu
Reviewed-by: Dipak Ghosh dipak.ghosh@hpe.com
Reviewed-by: Akash B akash-b@hpe.com
Signed-off-by: Samuel Wycliffe J samwyc@hpe.com

How Has This Been Tested?

Manually tested with the test case described in the original issue.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

This patch handles the race condition on simultaneous failure of
2 drives, which misses the vdev_rebuild_reset_wanted signal in
vdev_rebuild_thread. We retry to catch this inside the
vdev_rebuild_complete_sync function.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Signed-off-by: Samuel Wycliffe J <samwyc@hpe.com>
Closes openzfs#14041
Closes openzfs#14050
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 20, 2022
@behlendorf behlendorf merged commit fc1c005 into openzfs:zfs-2.1.7-staging Oct 21, 2022
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants