Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub stopping in the middle of run #4307

Closed
Rovanion opened this issue Feb 4, 2016 · 4 comments
Closed

Scrub stopping in the middle of run #4307

Rovanion opened this issue Feb 4, 2016 · 4 comments

Comments

@Rovanion
Copy link

Rovanion commented Feb 4, 2016

My latest scrub has been stuck in the same place for two days with no progress. It could be due to a drive breaking, but neither ZFS nor Linux has given up on the drive. The machine creates snapshots for each of its five filesystems every 15 minutes and sends these away to another machine, there isn't usually any load to speak of on the pool.

I'm running ZFS 0.6.5.2-1~trusty on 64-bit Ubuntu 14.04.3 with kernel 3.13.0-76-generic. My storage layout is the following:

  pool: storage
 state: ONLINE
  scan: scrub in progress since Tue Feb  2 03:00:22 2016
    1.76T scanned out of 1.92T at 8.53M/s, 5h23m to go
    0 repaired, 91.77% done
config:

    NAME                                                  STATE     READ WRITE CKSUM
    storage                                               ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-SEAGATE_ST35002NSSUN500G_09487CC2QD_9QMCC2QD  ONLINE       0     0     0
        ata-WDC_WD5000AAKX-60U6AA0_WD-WCC2EPH32977        ONLINE       0     0     0
      mirror-1                                            ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607E87H                       ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607VXFH                       ONLINE       0     0     0
      mirror-2                                            ONLINE       0     0     0
        ata-SAMSUNG_HD203WI_S1UYJ1VZ500791-part3          ONLINE       0     0     0
        ata-WDC_WD20EZRX-00D8PB0_WD-WCC4M2900721-part3    ONLINE       0     0     0
      mirror-3                                            ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607W3ZH                       ONLINE       0     0     0
        ata-Maxtor_6B300S0_B607VXPH                       ONLINE       0     0     0

errors: No known data errors

As of now there are a bunch of scheduled zfs operations, such as taking and destroying snapshots, progressively filling up the memory of the system.

Here are my arcstats and dmu_tx: http://paste.ubuntu.com/14877618/ http://paste.ubuntu.com/14877647/

Here is my dmesg containing ata errors and zfs stack traces from hung threads: http://paste.ubuntu.com/14877460/

Running iostat -dmx 1 doesn't show any drive getting a ton of io, it's mostly zeroes everywhere with the occasional small traffic to some devices. The 2T pool has 600GB of storage unallocated and the machine has 8GB of RAM which is half populated as I write this.

Is there any further information I can provide to aid in figuring out why this would happen. Is it at all related to issue #3947 and #3867? If the device was actually dead I'd assume that ZFS would mark it as such, but the machine seems to be stuck in limbo.

@kernelOfTruth
Copy link
Contributor

@Rovanion I've seen these traces several times and each time these (or similar) could have been related to a stalling or failing harddrive, heavy workload or high (memory) pressure

so I'd assume it's the drive causing this & the kernel & ZFS not giving up on it plus the load

referencing:
#2060 Possible bug in traverse_visitbp
#3148 random call trace on heavy load
#3903 txg_sync, zfs blocked for more than 120s on debian jessie/zfs 0.6.5.2-2
#3148 (comment) (possible stack problem with kernels older than 3.15, not sure if it still applies here)

@Rovanion
Copy link
Author

Rovanion commented Feb 4, 2016

I went ahead and rebooted the machine upon which the scrub continued as if nothing had ever happened.

For some reason unattended upgrades hasn't been doing its job on this machine so I'm upgrading ZFS now as I'm typing this. Hopefully the issue won't rear its ugly head again. This issue can be closed as far as I'm concerned unless a ZOL-developer has any interest in investigating it further.

@tuxoko
Copy link
Contributor

tuxoko commented Feb 4, 2016

@Rovanion
This is likely the same as #4166 and #4106. This is fixed in openzfs/spl@e843553
The patch is not included in stable release yet.

@loli10K
Copy link
Contributor

loli10K commented Sep 11, 2018

Closing since this should be fixed by openzfs/spl@e843553 which is included in the latest stable release.

@loli10K loli10K closed this as completed Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants