You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: zpool import on a failing vdev causes all further "zpool" commands to fail, even on different pools.
Setup:
Custom Linux from scratch with Kernel 4.3 based distribution
ZoL v0.6.5.3
Created multiple pools (in that scenario 20 pools), each backed by a single custom block device sitting on a RAID (this is absolutely not recommended, and I agree).
Pools are exported on shutdown and imported on boot.
The Volumes can enter a state where they will block IOs for several seconds (up to a few minutes) at a time, prompting ZFS to block the pool.
Issue:
When importing a pool backed by a vdev sitting on a Volume that's currently blocked (similar to trying to import a pool backed by a single really bad drive failing all IOs), the "zpool import" operation will hang in "D" state (waiting for IOs), while shortly after (120 seconds on our system) the Linux Kernel informs us that the task is blocked:
This is kind of expected: ZFS tried to import a pool on a vdev that was not going to honor IOs received. However, and this is really the main issue here, when we reach that state, all the other (unblocked) pools are also unresponsive. Trying to run "zpool status " on a different pool that was not blocked also hangs forever:
Even if the blocked Volume recovers and starts honoring IOs in time again, I think (to be verified) the state of the ZFS driver doesn't evolve: All further calls to "zpool" or "zfs list" will hang in "D" state, waiting for IOs.
We did not observe this issue when running on v0.6.5.1, but in that particular version we hit another bug whereby a kernel BUG or would occur when ZFS tried to process writes on a pool that hit a non recoverable error on its vdev. That issue was fixed in v0.6.5.3 where a message is printed in the Kernel log informing us that ZFS has encountered an unrecoverable error and that the pool is now blocked. This condition (in v0.6.5.3) can be recovered by calling "zpool clear", but again only when that condition occurs during normal operation, not when importing.
Could a change in the area of the code that handles unrecoverable errors on vdevs to fix that kernel BUG or changes in the "import" script cause the new behaviour? Also, maybe the issue was simply invisible as previously unrecoverable errors in vdev simply caused a kernel BUG (and thus wouldn't have gotten any further like we see in 0.6.5.3)?
The text was updated successfully, but these errors were encountered:
There has been some misunderstanding in the community over what import and export means. When a pool is imported, it is associated with the system and when it is exported, it is disassociated with a system. This should not occur across reboots. At boot, the imported pools should be reloaded into the kernel from the cachefile. It is important to make sure that udev symlinks such as /dev/disk/by-id/* are used to ensure that there are no reordering problems.
Basically all of the zpool subcommands and some of the zfs subcommands take the spa_namespace lock inside the kernel. The reason why everything blocks is that the zpool import command is waiting for an IO to be returned by the hardware while holding the lock. This is bad and we should find a way around it.
The reason why it remains blocked even if the device is "unblocked", is that it is waiting on an IO to be returned by the hardware and I assume that the hardware ignores the IO when in a "blocked" state, which causes it to hang indefinitely. This is a known issue. I took a stab at dealing with it a while back, but I didn't get it right at that time:
It is possible that you would not have encountered this issue in the form of the zfs/zpool commands hanging for operations on all other pools had you avoided doing pool export. You would have ended up with hangs from deadlocks elsewhere, but you might have avoided having the spa_namespace lock deterministically deadlock. The fact that you managed to do this could make developing fixes much easier if we could reproduce your environment when working on them.
It would be really helpful if you could explain how you blocked the device from servicing IOs. This is the bit that I lacked that kept me from finishing development of #2387 in the time that I had for it.
@ryao A very simple solution to block a device from servicing IOs is to mount a LUKS container on an USB drive, then pull the cable. Only problem: there is no way back.
I run into this from time to time when doing backups (pool in DMCRYPT LUKS container on USB disk), and the USB for some reason disconnects for whatever time or reason.
There dosn't seem to be a way to recover from this (unable to reactivate or remove the DMCRYPT device, since it is blocked by the IOs from ZFS, unable to reattach the missing drive to ZFS since it is already blocked on the IOs in the DMCRYPT queue) without a system reboot.
Summary: zpool import on a failing vdev causes all further "zpool" commands to fail, even on different pools.
Setup:
Issue:
When importing a pool backed by a vdev sitting on a Volume that's currently blocked (similar to trying to import a pool backed by a single really bad drive failing all IOs), the "zpool import" operation will hang in "D" state (waiting for IOs), while shortly after (120 seconds on our system) the Linux Kernel informs us that the task is blocked:
dmesg extract:
[ 481.275129] INFO: task zpool:14372 blocked for more than 120 seconds.
[ 481.282407] Tainted: P O 4.3.0 #1
[ 481.288029] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 481.296886] zpool D ffff88107f315380 0 14372 1723 0x00000000
[ 481.305003] ffff88001cffaff8 0000000000000086 ffff880f8db99c40 ffff88103849b880
[ 481.313649] ffff8800003f0002 0000000300400190 ffff880f58ea8000 ffff880f665141a0
[ 481.322308] 0000000000000305 0000000000000292 0000000000000001 ffff88001cff8008
[ 481.330969] Call Trace:
[ 481.333795] [] schedule+0x3e/0x90
[ 481.339447] [] __cv_signal+0x156/0x170 [spl]
[ 481.346167] [] ? woken_wake_function+0x20/0x20
[ 481.353075] [] __cv_wait+0x15/0x20 [spl]
[ 481.359417] [] spa_inject_addref+0x571/0xaf0 [zfs]
[ 481.366725] [] dmu_send_obj+0xade/0x1410 [zfs]
[ 481.373641] [] ? spa_inject_addref+0x5fa/0xaf0 [zfs]
[ 481.381135] [] ? spl_kmem_alloc+0x8d/0x1a0 [spl]
[ 481.388246] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.395160] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.402071] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.408989] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.415901] [] dmu_send_obj+0xe6e/0x1410 [zfs]
[ 481.422811] [] dmu_send_obj+0xd4b/0x1410 [zfs]
[ 481.429725] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.436641] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.443551] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.450467] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.457379] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.464289] [] dmu_send_obj+0xa5b/0x1410 [zfs]
[ 481.471205] [] dmu_send_obj+0xe6e/0x1410 [zfs]
[ 481.478130] [] dmu_send_obj+0xbb8/0x1410 [zfs]
[ 481.485050] [] ? l2arc_start+0x729/0x950 [zfs]
[ 481.491963] [] dmu_send_obj+0x11b7/0x1410 [zfs]
[ 481.498980] [] ? dsl_dataset_hold_obj+0x62e/0x820 [zfs]
[ 481.506772] [] traverse_dataset+0x56/0x60 [zfs]
[ 481.513787] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.521290] [] ? dsl_pool_config_exit+0x15/0x20 [zfs]
[ 481.528878] [] traverse_pool+0x183/0x1b0 [zfs]
[ 481.535796] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.543295] [] ? spa_inject_addref+0x490/0xaf0 [zfs]
[ 481.550797] [] spa_sync_allpools+0x14fc/0x2160 [zfs]
[ 481.558291] [] ? finish_wait+0x67/0x80
[ 481.568996] [] ? spa_inject_addref+0x1ff/0xaf0 [zfs]
[ 481.576496] [] spa_sync_allpools+0x1b6e/0x2160 [zfs]
[ 481.583993] [] spa_import+0x1d2/0x710 [zfs]
[ 481.590617] [] ? nvlist_exists+0xdf/0x120 [znvpair]
[ 481.598028] [] dataset_name_hidden+0x18ad/0x3840 [zfs]
[ 481.605730] [] pool_status_check+0x4f7/0x540 [zfs]
[ 481.613032] [] do_vfs_ioctl+0x93/0x4f0
[ 481.619166] [] ? putname+0x5f/0x70
[ 481.624905] [] SyS_ioctl+0x92/0xa0
[ 481.630648] [] entry_SYSCALL_64_fastpath+0x12/0x6a
This is kind of expected: ZFS tried to import a pool on a vdev that was not going to honor IOs received. However, and this is really the main issue here, when we reach that state, all the other (unblocked) pools are also unresponsive. Trying to run "zpool status " on a different pool that was not blocked also hangs forever:
[root@andrinox ~]# ps aux|grep zp
root 14372 2.1 0.0 40676 2964 tty1 D<+ 15:03 0:05 zpool import -d /tmp/tmpaa02DN z0x93d024d81200aee
root 20824 0.0 0.0 30232 1080 pts/1 D+ 15:03 0:00 zpool list z0x93d024d81200b1f
root 23077 0.0 0.0 4532 508 pts/2 S+ 15:07 0:00 grep zp
Even if the blocked Volume recovers and starts honoring IOs in time again, I think (to be verified) the state of the ZFS driver doesn't evolve: All further calls to "zpool" or "zfs list" will hang in "D" state, waiting for IOs.
We did not observe this issue when running on v0.6.5.1, but in that particular version we hit another bug whereby a kernel BUG or would occur when ZFS tried to process writes on a pool that hit a non recoverable error on its vdev. That issue was fixed in v0.6.5.3 where a message is printed in the Kernel log informing us that ZFS has encountered an unrecoverable error and that the pool is now blocked. This condition (in v0.6.5.3) can be recovered by calling "zpool clear", but again only when that condition occurs during normal operation, not when importing.
Could a change in the area of the code that handles unrecoverable errors on vdevs to fix that kernel BUG or changes in the "import" script cause the new behaviour? Also, maybe the issue was simply invisible as previously unrecoverable errors in vdev simply caused a kernel BUG (and thus wouldn't have gotten any further like we see in 0.6.5.3)?
The text was updated successfully, but these errors were encountered: