Description
Summary: zpool import on a failing vdev causes all further "zpool" commands to fail, even on different pools.
Setup:
- Custom Linux from scratch with Kernel 4.3 based distribution
- ZoL v0.6.5.3
- Created multiple pools (in that scenario 20 pools), each backed by a single custom block device sitting on a RAID (this is absolutely not recommended, and I agree).
- Pools are exported on shutdown and imported on boot.
- The Volumes can enter a state where they will block IOs for several seconds (up to a few minutes) at a time, prompting ZFS to block the pool.
Issue:
When importing a pool backed by a vdev sitting on a Volume that's currently blocked (similar to trying to import a pool backed by a single really bad drive failing all IOs), the "zpool import" operation will hang in "D" state (waiting for IOs), while shortly after (120 seconds on our system) the Linux Kernel informs us that the task is blocked:
dmesg extract:
[ 481.275129] INFO: task zpool:14372 blocked for more than 120 seconds.
[ 481.282407] Tainted: P O 4.3.0 #1
[ 481.288029] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 481.296886] zpool D ffff88107f315380 0 14372 1723 0x00000000
[ 481.305003] ffff88001cffaff8 0000000000000086 ffff880f8db99c40 ffff88103849b880
[ 481.313649] ffff8800003f0002 0000000300400190 ffff880f58ea8000 ffff880f665141a0
[ 481.322308] 0000000000000305 0000000000000292 0000000000000001 ffff88001cff8008
[ 481.330969] Call Trace:
[ 481.333795] [] schedule+0x3e/0x90
[ 481.339447] [] __cv_signal+0x156/0x170 [spl]
[ 481.346167] [] ? woken_wake_function+0x20/0x20
[ 481.353075] [] __cv_wait+0x15/0x20 [spl]
[ 481.359417] [] spa_inject_addref+0x571/0xaf0 [zfs]
[ 481.366725] [] dmu_send_obj+0xade/0x1410 [zfs]
[ 481.373641] [] ? spa_inject_addref+0x5fa/0xaf0 [zfs]
[ 481.381135] [] ? spl_kmem_alloc+0x8d/0x1a0 [spl]
[ 481.388246] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.395160] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.402071] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.408989] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.415901] [] dmu_send_obj+0xe6e/0x1410 [zfs]
[ 481.422811] [] dmu_send_obj+0xd4b/0x1410 [zfs]
[ 481.429725] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.436641] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.443551] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.450467] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.457379] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.464289] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.471205] [] dmu_send_obj+0xe6e/0x1410 [zfs]
[ 481.478130] [] dmu_send_obj+0xbb8/0x1410 [zfs]
[ 481.485050] [] ? l2arc_start+0x729/0x950 [zfs]
[ 481.491963] [] dmu_send_obj+0x11b7/0x1410 [zfs]
[ 481.498980] [] ? dsl_dataset_hold_obj+0x62e/0x820 [zfs]
[ 481.506772] [] traverse_dataset+0x56/0x60 [zfs]
[ 481.513787] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.521290] [] ? dsl_pool_config_exit+0x15/0x20 [zfs]
[ 481.528878] [] traverse_pool+0x183/0x1b0 [zfs]
[ 481.535796] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.543295] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.550797] [] spa_sync_allpools+0x14fc/0x2160 [zfs]
[ 481.558291] [] ? finish_wait+0x67/0x80
[ 481.568996] [] ? spa_inject_addref+0x1ff/0xaf0 [zfs]
[ 481.576496] [] spa_sync_allpools+0x1b6e/0x2160 [zfs]
[ 481.583993] [] spa_import+0x1d2/0x710 [zfs]
[ 481.590617] [] ? nvlist_exists+0xdf/0x120 [znvpair]
[ 481.598028] [] dataset_name_hidden+0x18ad/0x3840 [zfs]
[ 481.605730] [] pool_status_check+0x4f7/0x540 [zfs]
[ 481.613032] [] do_vfs_ioctl+0x93/0x4f0
[ 481.619166] [] ? putname+0x5f/0x70
[ 481.624905] [] SyS_ioctl+0x92/0xa0
[ 481.630648] [] entry_SYSCALL_64_fastpath+0x12/0x6a
This is kind of expected: ZFS tried to import a pool on a vdev that was not going to honor IOs received. However, and this is really the main issue here, when we reach that state, all the other (unblocked) pools are also unresponsive. Trying to run "zpool status " on a different pool that was not blocked also hangs forever:
[root@andrinox ~]# ps aux|grep zp
root 14372 2.1 0.0 40676 2964 tty1 D<+ 15:03 0:05 zpool import -d /tmp/tmpaa02DN z0x93d024d81200aee
root 20824 0.0 0.0 30232 1080 pts/1 D+ 15:03 0:00 zpool list z0x93d024d81200b1f
root 23077 0.0 0.0 4532 508 pts/2 S+ 15:07 0:00 grep zp
Even if the blocked Volume recovers and starts honoring IOs in time again, I think (to be verified) the state of the ZFS driver doesn't evolve: All further calls to "zpool" or "zfs list" will hang in "D" state, waiting for IOs.
We did not observe this issue when running on v0.6.5.1, but in that particular version we hit another bug whereby a kernel BUG or would occur when ZFS tried to process writes on a pool that hit a non recoverable error on its vdev. That issue was fixed in v0.6.5.3 where a message is printed in the Kernel log informing us that ZFS has encountered an unrecoverable error and that the pool is now blocked. This condition (in v0.6.5.3) can be recovered by calling "zpool clear", but again only when that condition occurs during normal operation, not when importing.
Could a change in the area of the code that handles unrecoverable errors on vdevs to fix that kernel BUG or changes in the "import" script cause the new behaviour? Also, maybe the issue was simply invisible as previously unrecoverable errors in vdev simply caused a kernel BUG (and thus wouldn't have gotten any further like we see in 0.6.5.3)?