Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766

thegreatgazoo · 2017-02-09T05:27:28Z

I was able to reproduce it on current master and 0.6.3-1.3 across different hardware:

# zpool create -f -o ashift=12 tank raidz1 sdd sde sdf sdg sdh spare sdu
# zpool offline tank sdh
# zpool replace tank sdh sdu
# zpool status
  scan: resilvered 361M in 0h0m with 0 errors on Thu Feb  9 05:01:02 2017
config: 

        NAME         STATE     READ WRITE CKSUM
        tank         DEGRADED     0     0     0
          raidz1-0   DEGRADED     0     0     0
            sdd      ONLINE       0     0     0
            sde      ONLINE       0     0     0
            sdf      ONLINE       0     0     0
            sdg      ONLINE       0     0     0
            spare-4  OFFLINE      0     0     0
              sdh    OFFLINE      0     0     0
              sdu    ONLINE       0     0     0
        spares
          sdu        INUSE     currently in use

The spare-4 was in OFFLINE state, which seemed incorrect (@ahrens agreed). The spare vdev is essentially mirror and with one child ONLINE its state should be degraded. In fact, if I export and import the pool, the spare-4 would become DEGRADED even though there's no change in its children state. In addition, OFFLINE doesn't seem to be a valid state for a non-leaf vdev at all, in vdev_offline_locked():

       if (!vd->vdev_ops->vdev_op_leaf)
               return (spa_vdev_state_exit(spa, NULL, ENOTSUP));

Also I found an Oracle document, which indicated DEGRADED should be the correct state.

This OFFLINE state will not affect new writes, as zio_vdev_io_start() checks vdev_accessible() only for leaf vdevs - writes do pass through the OFFLINE spare-4 to reach the sdu. This is correct. But for normal read IO, the parent raidz1 vdev will simply skip any OFFLINE child. I verified this by reading a 1.5G file that was not in ARC - my traces showed that vdev_raidz_reconstruct() was called 8176 times during the read, and iostat showed 0 read for sdu. Which is wrong.

In spa_vdev_attach():

        /*
         * If the parent is not a mirror, or if we're replacing, insert the new
         * mirror/replacing/spare vdev above oldvd.
         */
        if (pvd->vdev_ops != pvops)
                pvd = vdev_add_parent(oldvd, pvops);

        /*
         * Extract the new device from its root and add it to pvd.
         */
        vdev_remove_child(newrootvd, newvd);
        newvd->vdev_id = pvd->vdev_children;
        newvd->vdev_crtxg = oldvd->vdev_crtxg;
        vdev_add_child(pvd, newvd);

The spare-4 vdev, i.e. pvd in the code, was created by vdev_add_parent(oldvd, pvops) and got its state from oldvd (the OFFLINE sdh). So its state became OFFLINE. But after vdev_add_child(pvd, newvd), where newvd is the ONLINE spare sdu, its state didn't get refreshed. That's why it stayed OFFLINE, and why export/import would fix its state.

BTW @don-brady mentioned he was able to reproduce it on some old illumos, so it may affect more than our Linux ZFS.

The text was updated successfully, but these errors were encountered:

thegreatgazoo · 2017-02-09T18:17:31Z

Seemed easy to fix:

@@ -4787,6 +4787,11 @@ spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot, int replacing)
        newvd->vdev_crtxg = oldvd->vdev_crtxg;
        vdev_add_child(pvd, newvd);
 
+       /*
+        * Reevaluate the parent vdev state.
+        */
+       vdev_propagate_state(pvd);
+
        tvd = newvd->vdev_top;
        ASSERT(pvd->vdev_top == tvd);
        ASSERT(tvd->vdev_parent == rvd);

There's a similar vdev_propagate_state() call in spa_vdev_detach() as well.

behlendorf · 2017-02-09T19:37:46Z

@thegreatgazoo nice find! Can you open a new PR with the proposed fix which looks reasonable and we'll get some reviewers.

After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Signed-off-by: Isaac Huang <he.huang@intel.com> Closes openzfs#5766

thegreatgazoo · 2017-02-10T07:11:47Z

On second thought, if we add vdev_propagate_state(pvd) as shown above, both the pvd and newvd still had empty DTLs, at this point could a zio come at the top raidz1 vdev and cause the spare-4 to read bad data from newvd? But since we're already changing the vdev tree structure here, should the existing locking already able to prevent such races? I'm not sure.

Even if we'd move the vdev_propagate_state(pvd) down a bit after newvd DTL_MISSING has been updated, could there still be a similar race as vdev_dtl_reassess(spa->spa_root_vdev, 0, 0, B_FALSE) is called much later from spa_vdev_exit()?

After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes #5766 Closes #5770

After a hot spare replaces an OFFLINE vdev, the new parent spare vdev state is set incorrectly to OFFLINE. The correct state should be DEGRADED. The incorrect OFFLINE state will prevent top-level vdev from reading the spare vdev, thus causing unnecessary reconstruction. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Isaac Huang <he.huang@intel.com> Closes openzfs#5766 Closes openzfs#5770

thegreatgazoo mentioned this issue Feb 10, 2017

Fix incorrect spare vdev state after replacing #5770

Merged

behlendorf closed this as completed in #5770 Feb 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766

Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766

thegreatgazoo commented Feb 9, 2017

thegreatgazoo commented Feb 9, 2017

behlendorf commented Feb 9, 2017

thegreatgazoo commented Feb 10, 2017

Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766

Incorrect spare vdev state after offline drive has been replaced by a hot spare #5766

Comments

thegreatgazoo commented Feb 9, 2017

thegreatgazoo commented Feb 9, 2017

behlendorf commented Feb 9, 2017

thegreatgazoo commented Feb 10, 2017