Scrub stopping in the middle of run #4307

Rovanion · 2016-02-04T14:34:10Z

My latest scrub has been stuck in the same place for two days with no progress. It could be due to a drive breaking, but neither ZFS nor Linux has given up on the drive. The machine creates snapshots for each of its five filesystems every 15 minutes and sends these away to another machine, there isn't usually any load to speak of on the pool.

I'm running ZFS 0.6.5.2-1~trusty on 64-bit Ubuntu 14.04.3 with kernel 3.13.0-76-generic. My storage layout is the following:

  pool: storage
 state: ONLINE
  scan: scrub in progress since Tue Feb  2 03:00:22 2016
    1.76T scanned out of 1.92T at 8.53M/s, 5h23m to go
    0 repaired, 91.77% done
config:

    NAME                                                  STATE     READ WRITE CKSUM
    storage                                               ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-SEAGATE_ST35002NSSUN500G_09487CC2QD_9QMCC2QD  ONLINE       0     0     0
        ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EPH32977        ONLINE       0     0     0
      mirror-1                                            ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607E87H                       ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607VXFH                       ONLINE       0     0     0
      mirror-2                                            ONLINE       0     0     0
        ata-SAMSUNG_HD203WI_S1UYJ1VZ500791-part3          ONLINE       0     0     0
        ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M2900721-part3    ONLINE       0     0     0
      mirror-3                                            ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607W3ZH                       ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607VXPH                       ONLINE       0     0     0

errors: No known data errors

As of now there are a bunch of scheduled zfs operations, such as taking and destroying snapshots, progressively filling up the memory of the system.

Here are my arcstats and dmu_tx: http://paste.ubuntu.com/14877618/ http://paste.ubuntu.com/14877647/

Here is my dmesg containing ata errors and zfs stack traces from hung threads: http://paste.ubuntu.com/14877460/

Running iostat -dmx 1 doesn't show any drive getting a ton of io, it's mostly zeroes everywhere with the occasional small traffic to some devices. The 2T pool has 600GB of storage unallocated and the machine has 8GB of RAM which is half populated as I write this.

Is there any further information I can provide to aid in figuring out why this would happen. Is it at all related to issue #3947 and #3867? If the device was actually dead I'd assume that ZFS would mark it as such, but the machine seems to be stuck in limbo.

The text was updated successfully, but these errors were encountered:

kernelOfTruth · 2016-02-04T15:34:04Z

@Rovanion I've seen these traces several times and each time these (or similar) could have been related to a stalling or failing harddrive, heavy workload or high (memory) pressure

so I'd assume it's the drive causing this & the kernel & ZFS not giving up on it plus the load

referencing:
#2060 Possible bug in traverse_visitbp
#3148 random call trace on heavy load
#3903 txg_sync, zfs blocked for more than 120s on debian jessie/zfs 0.6.5.2-2
#3148 (comment) (possible stack problem with kernels older than 3.15, not sure if it still applies here)

Rovanion · 2016-02-04T16:30:30Z

I went ahead and rebooted the machine upon which the scrub continued as if nothing had ever happened.

For some reason unattended upgrades hasn't been doing its job on this machine so I'm upgrading ZFS now as I'm typing this. Hopefully the issue won't rear its ugly head again. This issue can be closed as far as I'm concerned unless a ZOL-developer has any interest in investigating it further.

tuxoko · 2016-02-04T18:48:22Z

@Rovanion
This is likely the same as #4166 and #4106. This is fixed in openzfs/spl@e843553
The patch is not included in stable release yet.

loli10K · 2018-09-11T18:33:28Z

Closing since this should be fixed by openzfs/spl@e843553 which is included in the latest stable release.

loli10K closed this as completed Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrub stopping in the middle of run #4307

Scrub stopping in the middle of run #4307

Rovanion commented Feb 4, 2016

kernelOfTruth commented Feb 4, 2016

Rovanion commented Feb 4, 2016

tuxoko commented Feb 4, 2016

loli10K commented Sep 11, 2018

Scrub stopping in the middle of run #4307

Scrub stopping in the middle of run #4307

Comments

Rovanion commented Feb 4, 2016

kernelOfTruth commented Feb 4, 2016

Rovanion commented Feb 4, 2016

tuxoko commented Feb 4, 2016

loli10K commented Sep 11, 2018