-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766
Comments
Seemed easy to fix: @@ -4787,6 +4787,11 @@ spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot, int replacing)
newvd->vdev_crtxg = oldvd->vdev_crtxg;
vdev_add_child(pvd, newvd);
+ /*
+ * Reevaluate the parent vdev state.
+ */
+ vdev_propagate_state(pvd);
+
tvd = newvd->vdev_top;
ASSERT(pvd->vdev_top == tvd);
ASSERT(tvd->vdev_parent == rvd); There's a similar vdev_propagate_state() call in spa_vdev_detach() as well. |
@thegreatgazoo nice find! Can you open a new PR with the proposed fix which looks reasonable and we'll get some reviewers. |
After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Signed-off-by: Isaac Huang <he.huang@intel.com> Closes openzfs#5766
On second thought, if we add vdev_propagate_state(pvd) as shown above, both the pvd and newvd still had empty DTLs, at this point could a zio come at the top raidz1 vdev and cause the spare-4 to read bad data from newvd? But since we're already changing the vdev tree structure here, should the existing locking already able to prevent such races? I'm not sure. Even if we'd move the vdev_propagate_state(pvd) down a bit after newvd DTL_MISSING has been updated, could there still be a similar race as vdev_dtl_reassess(spa->spa_root_vdev, 0, 0, B_FALSE) is called much later from spa_vdev_exit()? |
After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes #5766 Closes #5770
After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes openzfs#5766 Closes openzfs#5770
After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes openzfs#5766 Closes openzfs#5770
I was able to reproduce it on current master and 0.6.3-1.3 across different hardware:
The spare-4 was in OFFLINE state, which seemed incorrect (@ahrens agreed). The spare vdev is essentially mirror and with one child ONLINE its state should be degraded. In fact, if I export and import the pool, the spare-4 would become DEGRADED even though there's no change in its children state. In addition, OFFLINE doesn't seem to be a valid state for a non-leaf vdev at all, in vdev_offline_locked():
Also I found an Oracle document, which indicated DEGRADED should be the correct state.
This OFFLINE state will not affect new writes, as zio_vdev_io_start() checks vdev_accessible() only for leaf vdevs - writes do pass through the OFFLINE spare-4 to reach the sdu. This is correct. But for normal read IO, the parent raidz1 vdev will simply skip any OFFLINE child. I verified this by reading a 1.5G file that was not in ARC - my traces showed that vdev_raidz_reconstruct() was called 8176 times during the read, and iostat showed 0 read for sdu. Which is wrong.
In spa_vdev_attach():
The spare-4 vdev, i.e. pvd in the code, was created by vdev_add_parent(oldvd, pvops) and got its state from oldvd (the OFFLINE sdh). So its state became OFFLINE. But after vdev_add_child(pvd, newvd), where newvd is the ONLINE spare sdu, its state didn't get refreshed. That's why it stayed OFFLINE, and why export/import would fix its state.
BTW @don-brady mentioned he was able to reproduce it on some old illumos, so it may affect more than our Linux ZFS.
The text was updated successfully, but these errors were encountered: