zpool destroy fails on zpool with suspended IO #2878

kiranfractal · 2014-11-07T12:54:52Z

Hi,

I am trying disk replacement from a single disk zpool and it results to suspended IO and does not allow me to destroy the zpool.

Steps to reproduce:

create a zpool with single disk (zpool create zp1 /dev/sda)
remove the disk
zpool status shows disk unavailable
insert a new disk
zpool replace zp1 /dev/sda /dev/sdb
it says "cannot replace /dev/sda with /dev/sdb : pool I/O is currently suspended"

The above steps holds good for zpool destroy as well.

Now the zpool is unusable and only option for me to destroy zpool is to reboot the system and destroy the old pool and create a new pool.

Is there any other option can I try so that I can replace the disk without rebooting the system ?

Thanks,
Kiran.

GregorKopka · 2014-11-07T15:46:02Z

Maybe reattach the original disk and try zpool clear or zpool online ?

kiranfractal · 2014-11-10T05:23:52Z

I agree with you but I just want to remove a old disk and insert new disk in the same slot and recreate a zpool without reboot.

GregorKopka · 2014-11-10T07:15:24Z

Then try -f flags on zpool export or destroy.

kiranfractal · 2014-11-10T07:39:50Z

I removed the disk and inserted new disk tried your command but they show below messages,

[root@fractal-C92E ~]# zpool status <=== removed drive from the slot and added new drive in the same slot
pool: zp2
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
zp2         ONLINE       0     0     0
  sdb       ONLINE       0     0     0

errors: No known data errors

==> Write some data to the zpool ie zp2

[root@fractal-C92E ~]# zpool clear zp2

[root@fractal-C92E ~]# zpool status
pool: zp2
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-JQ
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
zp2         ONLINE       0     4     0
  sdb       ONLINE       3    14     0

errors: 2 data errors, use '-v' for a list

[root@fractal-C92E ~]# zpool export -f zp2
umount2: Device or resource busy
umount: /zp2: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
cannot unmount '/zp2': umount failed

[root@fractal-C92E ~]# zpool destroy zp2
umount: /zp2: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
cannot unmount '/zp2': umount failed
could not destroy 'zp2': could not unmount datasets

Now only option is to reboot to recreate the zpool.

Let me know to try any further commands.

Thanks.

kiranfractal · 2014-11-10T07:49:31Z

[root@fractal-C92E ~]# zpool replace -f zp2 /dev/sdb /dev/disk/by-id/scsi-SATA_ST3750640NS_3QD1GN87
cannot replace /dev/sdb with /dev/disk/by-id/scsi-SATA_ST3750640NS_3QD1GN87: pool I/O is currently suspended

GregorKopka · 2014-11-10T08:25:15Z

Put the old (original) disk in the system so it is attached as the device which ZFS thinks is the member of the pool (before your last replace that was /dev/sdb), then
zpool clear zp2
which should bring the system to a state where you are be able access the pool again - including the option to destroy it.

Simple solution: reboot.

One thing to keep in mind: replacing a non-redundant vdev of a pool needs to be done while the data on it is still available - else the data can't be copied, like in your example where you pulled the drive.

kiranfractal · 2014-11-10T09:19:22Z

Our use case is we don't use ZFS raid but a single drive per pool, since replication factor is taken care at gluster volume.

How about an option to support forced destroy (to destroy when pool I/O is currently suspended ) so that this corner case can be addressed ?

kiranfractal · 2014-11-10T10:21:43Z

We are looking for graceful replacement of drive without reboot. Is there any workaround ?

GregorKopka · 2014-11-11T07:53:23Z

@kiranfractal If you plan to create redundancy only through the use of replication in gluster then you should be able to reboot the box without any problems at any point in time.

For replacing see man zpool for useage and limitations.

Note that your use case (live pool with non-redundant top level vdev plus the disk backing that vdev being offline) is afaik currently not supported apart from reconnecting the original disk (including the data on it) to continue.

Destroying the redundancy of a pool below one good copy is a goto to restore from backup, per definition this is offline time. At the moment i can only suggest for production to either change your setup (create pools with redundancy, like zfs is intended to be used if you care about your data being healthy and online on that specific machine) or use a simple throwaway posix filesystem to be bricked as storage backend for gluster (which might nevertheless block in case you remove the disk backing the filesystem prior to unmounting).

Only question left:
@behlendorf To be able to export/destroy a (even suspended) imported pool - even if that might cause data loss, even on-disk - might be useful, for example in case if an USB backup drive containing such a simple pool goes south.

Are there plans for ZoL to support this?

(edit: accidental premature send)

behlendorf · 2014-11-12T23:41:30Z

There's definitely some work to be done here, and coincidentally enough just yesterday I opened a pull request, #2890, with a few bug fixes along these lines.

Most of the infrastructure to force export/destroy a pool is already in place but there are still some gotchas which need to be sorted out. For example, one major restriction imposed by the Linux VFS we'll have to contend with is that a filesystem cannot be unmounted if it has open file handles. And any mounted filesystem will hold references on the pool which will prevent us from destroying it.

So if the administration kills off all the processes with open file handles then the filesystem should be unmountable even when the pool is suspended. Additionally, if the -F hardforce option is given you should be able to export/destroy even a suspended pool. At the moment -F is undocumented but it does exist.

What probably needs to happen is for someone to take a moment and work through all of these cases is wee what works currently and what doesn't.

kiranfractal · 2014-11-13T06:00:37Z

I tried with -F option to export and it is hanging as below.

I created a zpool with single disk
copied some data to it
Removed the disk and cleared zpool (zpool clear mnt)
zpool status

pool: mnt
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
mnt         UNAVAIL      0     0     0  insufficient replicas
  sdc       UNAVAIL      0     0     0

[root@fractal-c92e ~]# zpool export -F mnt
[root@fractal-c92e ~]# ps aux | grep zpool
root 2413 0.0 0.0 126064 1524 pts/0 D+ 11:17 0:00 zpool export -F mnt

dmesg output:
---SPLError: 1177:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'mnt' has encountered an uncorrectable I/O failure and has been suspended.

SPLError: 1176:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'mnt' has encountered an uncorrectable I/O failure and has been suspended.

sysdig_probe: driver loading
sysdig_probe: initializing ring buffer for CPU 0
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 1
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 2
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 3
sysdig_probe: CPU buffer initialized, size=1048576
INFO: task txg_sync:1244 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000002 0 1244 2 0x00000000
ffff88027257fb70 0000000000000046 ffff88027257fb80 ffff880273010918
0000000000000001 ffff880273010930 0000000000000000 0000000000000000
ffff88027257faf0 ffffffff81064ba2 ffff88027aaa5ab8 ffff88027257ffd8
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? ktime_get_ts+0xb1/0xf0
[] io_schedule+0x73/0xc0
[] cv_wait_common+0xac/0x1c0 [spl]
[] ? dmu_objset_write_ready+0x0/0x50 [zfs]
[] ? zio_execute+0x0/0x140 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait_io+0x18/0x20 [spl]
[] zio_wait+0xfb/0x1b0 [zfs]
[] dsl_pool_sync+0x2b3/0x440 [zfs]
[] spa_sync+0x40b/0xae0 [zfs]
[] txg_sync_thread+0x384/0x5e0 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x5e0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x9e/0xc0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xc0
[] ? child_rip+0x0/0x20
INFO: task zpool:2413 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool D 0000000000000001 0 2413 2395 0x10000080
ffff88027af01ad8 0000000000000082 0000000000000000 ffff88027b480080
0000000000000055 ffff880277968240 0000004dedc3628a 0000000000000282
ffff88027af01b78 00000001000085cc ffff88027aade638 ffff88027af01fd8
Call Trace:
[] cv_wait_common+0x105/0x1c0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0xb3/0x190 [zfs]
[] dmu_tx_wait+0x1b4/0x2b0 [zfs]
[] dmu_tx_assign+0x91/0x490 [zfs]
[] ? read_tsc+0x9/0x20
[] spa_history_log_nvl+0x7d/0x160 [zfs]
[] ? nvlist_add_string+0x1b/0x20 [znvpair]
[] spa_history_log+0x58/0xc0 [zfs]
[] zfs_log_history+0xfc/0x100 [zfs]
[] zfs_ioc_pool_export+0x2f/0x60 [zfs]
[] zfsdev_ioctl+0x45c/0x4d0 [zfs]
[] ? __do_page_fault+0x1ec/0x480
[] vfs_ioctl+0x22/0xa0
[] do_vfs_ioctl+0x84/0x580
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b
INFO: task txg_sync:1244 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000002 0 1244 2 0x00000000
ffff88027257fb70 0000000000000046 ffff88027257fb80 ffff880273010918
0000000000000001 ffff880273010930 0000000000000000 0000000000000000
ffff88027257faf0 ffffffff81064ba2 ffff88027aaa5ab8 ffff88027257ffd8
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? ktime_get_ts+0xb1/0xf0
[] io_schedule+0x73/0xc0
[] cv_wait_common+0xac/0x1c0 [spl]
[] ? dmu_objset_write_ready+0x0/0x50 [zfs]
[] ? zio_execute+0x0/0x140 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait_io+0x18/0x20 [spl]
[] zio_wait+0xfb/0x1b0 [zfs]
[] dsl_pool_sync+0x2b3/0x440 [zfs]
[] spa_sync+0x40b/0xae0 [zfs]
[] txg_sync_thread+0x384/0x5e0 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x5e0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x9e/0xc0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xc0
[] ? child_rip+0x0/0x20
INFO: task zpool:2413 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool D 0000000000000001 0 2413 2395 0x00000080
ffff88027af01ad8 0000000000000082 0000000000000000 ffff88027b480080
0000000000000055 ffff880277968240 0000004dedc3628a 0000000000000282
ffff88027af01b78 00000001000085cc ffff88027aade638 ffff88027af01fd8
Call Trace:
[] cv_wait_common+0x105/0x1c0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0xb3/0x190 [zfs]
[] dmu_tx_wait+0x1b4/0x2b0 [zfs]
[] dmu_tx_assign+0x91/0x490 [zfs]
[] ? read_tsc+0x9/0x20
[] spa_history_log_nvl+0x7d/0x160 [zfs]
[] ? nvlist_add_string+0x1b/0x20 [znvpair]
[] spa_history_log+0x58/0xc0 [zfs]
[] zfs_log_history+0xfc/0x100 [zfs]
[] zfs_ioc_pool_export+0x2f/0x60 [zfs]
[] zfsdev_ioctl+0x45c/0x4d0 [zfs]
[] ? __do_page_fault+0x1ec/0x480
[] vfs_ioctl+0x22/0xa0
[] do_vfs_ioctl+0x84/0x580
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b

behlendorf · 2014-11-13T20:44:39Z

Thanks for the additional test case. In this case it's blocking trying to update the history which clearly won't work. This is a slight variation on the other issue.

Let me ask an opened ended question. For a pool which is suspended what would you expect the following behavior to be for the following commands. Where -f mean force and -F means hardforce.

zpool export|destroy < pool >
zpool export|destroy -f < pool >
zpool export|destroy -F < pool >

kiranfractal · 2014-11-19T12:21:53Z

Let me start with a disclaimer that I have not started to look at the inside workings of ZFS. However, we rely heavily on ZFS to provide ondisk data consistency without interruptions to data access (caused by say a reboot). I'm coming at this from the situation of recovering gracefully from an unrecoverable failure in the underlying drives. This could be either in the situation when we have a non-redundant pool (say a pool per disk) or a redundant pool where the failure tolerances have been exceeded (say a raidz1 with a 2 disk failure). In these situations, I'd ideally like to recover without the need for a system reboot which will cause other services to also be affected. I'm not so worried about the loss of data caused by the failed drive because I can handle that at a higher layer.

That said, here are some tentative answers for the questions given above, specifically for a destroy :

Should succeed if there are no underlying causes to fail and should exit with an error for all other conditions.
Forcefully unmount before the destroy.
The operation should succeed regardless of what happens.

When is the tentative date for the zfs-0.6.4 release ?

openzfs#2890 openzfs#2878 Hi, I am trying disk replacement from a single disk zpool and it results to suspended IO and does not allow me to destroy the zpool. Steps to reproduce: create a zpool with single disk (zpool create zp1 /dev/sda) remove the disk zpool status shows disk unavailable insert a new disk zpool replace zp1 /dev/sda /dev/sdb it says "cannot replace /dev/sda with /dev/sdb : pool I/O is currently suspended" The above steps holds good for zpool destroy as well. Now the zpool is unusable and only option for me to destroy zpool is to reboot the system and destroy the old pool and create a new pool. Is there any other option can I try so that I can replace the disk without rebooting the system ? Thanks, Kiran.

Cleanly destroying or exporting a pool requires that the pool not be suspended. Therefore, set the POOL_CHECK_SUSPENDED flag for these ioctls so the utilities will output a descriptive error message rather than block. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2878

Cleanly destroying or exporting a pool requires that the pool not be suspended. Therefore, set the POOL_CHECK_SUSPENDED flag for these ioctls so the utilities will output a descriptive error message rather than block. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #2878

behlendorf · 2015-03-02T19:55:21Z

I've merged 87a63dd to master which ensures a reasonable error is generated when attempting to destroy or export a suspended pool. This is far preferable to a hang. However, I'd still like to leave this issue open so we can explore what is reasonable behavior for a force option.

87a63dd Prevent "zpool destroy|export" when suspended

darkpixel · 2015-12-31T17:05:31Z

We are trying to use ZFS for the exact same scenario. Our clients have a pool of spinning rust they use for primary data storage. We zfs send that data to two places--an off-site backup, and a local single external USB drive.

Basically they want fast (local) recovery options in case somehow the primary data storage dies from the local USB drive. And they want an off-site backup in case the office burns down.

We have external USB drives (Seagate) drives at 24 locations and it seems like one dies every month. When they die, we ship a new drive out, get it plugged in, and then have to reboot the box because we can't destroy 'backup-pool' because IO is suspended. Once the box has been rebooted 'backup-pool' no longer shows up, then we can create a new 'backup-pool' from the newly installed drive.

I agree with @kiranfractal that there should be a '-F' option (in addition to '-f') that would force removal of the pool regardless of IO being suspended.

kernelOfTruth · 2015-12-31T17:54:54Z

referencing

https://www.illumos.org/issues/4128 disks in zpools never go away when pulled
illumos/illumos-gate@39cddb1

ashjas · 2016-01-31T13:55:32Z

This same issue i am facing in such a scenario:

sudo zpool create mypool -m ~/ZFS mirror /dev/sdc1 /dev/sdc2
I am using mirror on a single disk... i know this is inefficient.. but im testing...
sudo zfs mount / umount + sudo zpool import/export works as expected.
reboot
put in a flash drive, so that my zpool device become as : /dev/sdd1 & /dev/sdd2. // sdc1 is now for the inserted flashdrive.
neither sudo zfs mount / umount nor sudo zpool import/export works as expected.

----------> output of commands issued:

sudo zpool status -v
  pool: mypool
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jan 31 19:12:37 2016
config:

    NAME                                                 STATE     READ WRITE CKSUM
    mypool                                               ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE       0     0     0

errors: No known data errors

sudo zpool import
no pools available to import

sudo zpool import mypool
cannot import 'mypool': a pool with that name already exists
use the form 'zpool import <pool | id> <newpool>' to give it a new name

sudo zpool export mypool
umount: /home/ashish/ZFS: not mounted
cannot unmount '/home/ashish/ZFS': umount failed

sudo zfs mount mypool
cannot mount 'mypool': filesystem already mounted

sudo zfs get all mypool | grep mount
mypool  mounted               yes                    -
mypool  mountpoint            /home/ashish/ZFS       local
mypool  canmount              on                     default

Surprisingly, issuing a scrub command:
sudo zpool scrub mypool

correctly tries to scan the Usb HDD, as i can see its LED blinking.

Also, if in this booted session of ubunt, if i fix the path back to /dev/sdc1 /dev/sdc2 for the zpools, by removing the flash drive , and reattaching the zpool drive, the above does not change.. the issue still remains. This only gets fixed if i reboot the system.

dmesg.log

Edit : In a different fresh rebooted session, with /dev/sdc1,2

sudo zpool import
no pools available to import

But :
sudo zfs mount -a
Mounts the zpool correctly.

sudo zpool status
  pool: mypool
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jan 31 19:18:44 2016
config:

    NAME                                                 STATE     READ WRITE CKSUM
    mypool                                               ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE       0     0     0

errors: No known data errors

thereafter after doing an export,
sudo zpool export mypool
import command works fine now:

sudo zpool import 
   pool: mypool
     id: 977336718416363205
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    mypool                                               ONLINE
      mirror-0                                           ONLINE
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE

after adding some files to the zpool,
export misbehaves:
sudo zpool export mypool 
umount: /home/ashish/ZFS: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)
cannot unmount '/home/ashish/ZFS': umount failed

lsof ZFS/
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ibus-daem 7514 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-dcon 7524 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-ui-g 7525 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-x11  7527 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-engi 7538 ashish  cwd    DIR   0,36        6    4 ZFS

fuser ZFS/
/home/ashish/ZFS:     7514c  7524c  7525c  7527c  7538c

this time dmesg2.log

There is some bug.. let me know if i can be of some other help...

Hope this gets sorted...

ilovezfs · 2016-01-31T15:13:47Z

@ashjas I'm not seeing anything wrong or unexpected there, and certainly nothing having to do with device names. The only thing that looks like a problem is

cannot unmount '/home/ashish/ZFS': umount failed

which typically indicates that that particular dataset is busy. Often that can be resolved with -f or closing whatever files are opened. lsof is your friend.

ashjas · 2016-02-01T04:02:50Z

@ilovezfs
For the first scenario where the device name get changed...
5) neither sudo zfs mount / umount nor sudo zpool import/export works as expected.

This is not an issue? I'm unable to mount the volume.
If I'm giving wrong commands ,let me know. But how this isn't an issue is out of my head.

Secondly -f doesn't make any difference at all!

darkpixel · 2016-02-01T16:10:59Z

@ashjas: I don't create pools like this: sudo zpool create mypool -m ~/ZFS mirror /dev/sdc1 /dev/sdc2

I don't know about how ZFS handles that, but /dev/sda, /dev/sdb, etc... can change when devices are changed.

I usually create the pools by using devices in /dev/disk/by-id/*

Those ID numbers are not supposed to change, even if devices are switched around.

GregorKopka · 2016-10-12T05:45:09Z

@behlendorf Please reconsider stuck zpool/zfs command / inability to remove a pool from the system not being a bug.

This scenario isn't that uncommon with USB-disk based pools used for backups, 'reboot to fix it' should keep to be a speciality of windows...

bjquinn · 2016-10-12T15:19:19Z

Agreed. USB pools for backups are not reliably usable given this issue.

behlendorf · 2016-10-12T18:06:11Z

@GregorKopka I agree. There's lots of room for improvement when someone has the time to focus on this. I've just removed the Bug tag from all issues because it wasn't actually helpful.

mailinglists35 · 2016-10-27T07:51:26Z

@behlendorf is there any hope for someone to implement this anytime soon?

behlendorf · 2016-10-27T16:50:28Z

No developers I know of are currently working on this.

A pool may only be resumed when the txg in the "best" uberblock found on-disk matches the in-core last synced txg. This is done to verify that the pool was not modified by another node or process while it was suspended. If this were to happen the result would be a corrupted pool. Since a suspended pool may no longer always be resumable it was necessary to extend the 'zpool export -F` command to allow a suspended pool to be exported. This was accomplished by leveraging the existing spa freeze functionality. During export if '-F' is given and the pool is suspended the pool will be frozen at the last synced txg and all in-core dirty data will be discarded. This allows for the pool to be safely exported without having to reboot the system. In order to test this functionality the broken 'ztest -E' option, which allows for ztest to use an existing pool, was fixed. The code needed for this was copied over from zdb. ztest is used to modify the test pool from user space while the kernel has the pool imported and suspended. This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256 by allowing a suspended pool to be exported, 'zpool export -F'. There may still be cases where a reference on the pool, such as a filesystem which cannot be unmounted, will prevent the pool from being exported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

A pool may only be resumed when the txg in the "best" uberblock found on-disk matches the in-core last synced txg. This is done to verify that the pool was not modified by another node or process while it was suspended. If this were to happen the result would be a corrupted pool. Since a suspended pool may no longer always be resumable it was necessary to extend the 'zpool export -F` command to allow a suspended pool to be exported. This was accomplished by leveraging the existing spa freeze functionality. During export if '-F' is given and the pool is suspended the pool will be frozen at the last synced txg and all in-core dirty data will be discarded. This allows for the pool to be safely exported without having to reboot the system. In order to test this functionality the broken 'ztest -E' option, which allows for ztest to use an existing pool, was fixed. The code needed for this was copied over from zdb. ztest is used to modify the test pool from user space while the kernel has the pool imported and suspended. This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256 by allowing a suspended pool to be exported, 'zpool export -F'. There may still be cases where a reference on the pool, such as a filesystem which cannot be unmounted, will prevent the pool from being exported. Add a basic zpool_tryimport function which can be used by zhack, zdb, and ztest to provide minimum pool import functionality. This way each utility isn't doing a slightly different thing. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

A pool may only be resumed when the txg in the "best" uberblock found on-disk matches the in-core last synced txg. This is done to verify that the pool was not modified by another node or process while it was suspended. If this were to happen the result would be a corrupted pool. Since a suspended pool may no longer always be resumable it was necessary to extend the 'zpool export -F` command to allow a suspended pool to be exported. This was accomplished by leveraging the existing spa freeze functionality. During export if '-F' is given and the pool is suspended the pool will be frozen at the last synced txg and all in-core dirty data will be discarded. This allows for the pool to be safely exported without having to reboot the system. In order to test this functionality the broken 'ztest -E' option, which allows for ztest to use an existing pool, was fixed. The code needed for this was copied over from zdb. ztest is used to modify the test pool from user space while the kernel has the pool imported and suspended. This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256 by allowing a suspended pool to be exported, 'zpool export -F'. There may still be cases where a reference on the pool, such as a filesystem which cannot be unmounted, will prevent the pool from being exported. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

lnxbil · 2018-02-14T09:42:51Z

I just ran into the same problem after accidentially removing the only disk the pool consists of via a echo 1 > /sys/block/sdr/delete call and I was (after a lot of different tries) able to reimport the pool after reading this thread:

$ zpool status -v externes-backup
  pool: externes-backup
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: scrub repaired 0B in 29h48m with 0 errors on Thu Aug  4 20:51:21 2016
config:

        NAME                     STATE     READ WRITE CKSUM
        externes-backup          ONLINE       1     9     0
          externes_backup_crypt  ONLINE       1     0     0

errors: List of errors unavailable: pool I/O is currently suspended

Just for others that might run into the same problem:

I have/had a luks-crypted device called externes_backup_crypt
Closing luks device (to be able to have the same name again, otherwise it yields no such device in pool)
```
  $ cryptsetup luksClose externes_backup_crypt
```

Rescan SCSI-Bus

  $ for bus in /sys/class/scsi_host/host*; do echo "- - -" > $bus/scan; done

Decrypt device

  $ cryptsetup luksOpen /dev/sds externes_backup_crypt

put disk online
```
  $ zpool online externes-backup  /dev/mapper/externes_backup_crypt
```
cannot online /dev/mapper/externes_backup_crypt: pool I/O is currently suspended
not, you have to clear the pool and it'll continue to work again:
```
  $ zpool clear externes-backup
```
Afterwards, the pool is imported and automatically scrubbed:

$ root@backup ~ > zpool status -v externes-backup
  pool: externes-backup
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Feb 14 10:32:32 2018
        4,87G scanned out of 6,14T at 12,1M/s, 147h47m to go
        0B repaired, 0,08% done
config:

        NAME                     STATE     READ WRITE CKSUM
        externes-backup          ONLINE       0     0     0
          externes_backup_crypt  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x1>
        <metadata>:<0x18675>
        <metadata>:<0x3dca1>

Now, I'm at least able to export the pool correctly.

felisucoibi · 2018-07-26T07:01:15Z

@lnxbil thanks, at least we have something,i have the sameproblem, using LUKS+ZFS and when something goes wrong i have to restart the server, the problem comes when i'm not at home and i cant reboot. The pitty is when i use cryptsetup luksClose RAID1 it says the device is busy.... :(

GregorKopka · 2020-03-09T20:29:14Z

@behlendorf this issue was closed without a who and when - is that supposed to happen?

There (AFAIK) still isn't a solution to this (only workarounds for a subset of the problem), so the defect remains. Also quite a lot of information here that would certainly be interesting to someone who decides to work on this at some point in the future.

Please repoen.

behlendorf · 2020-03-10T18:13:01Z

It's not clear to me exactly how this was closed. I don't have any objection to reopening it since this is still and issue.

devZer0 · 2020-05-31T09:05:19Z

this is still an issue in 2020

i had some flaky disks in a server and wanted to remove those, i'm somewhat sure i did zpool export or destroy before removing those 3 disks, but i must have done wrong, at least the pool was online and mounted while the disks being ripped off (didn't double check before) - now the pool it's in suspended state and apparently , it's not possible to destroy/remove from the system when in this state.

as there are VMs online on this system, i'm rather curious what to do without shutting them all down and reboot....

i'm getting this in dmesg after another "zpool destroy..." hung for a while....

[1510610.643638] INFO: task txg_sync:1714 blocked for more than 120 seconds.
[1510610.643684] Tainted: P IOE 5.4.34-1-pve #1
[1510610.643711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1510610.643741] txg_sync D 0 1714 2 0x80004000
[1510610.643744] Call Trace:
[1510610.643752] __schedule+0x2e6/0x700
[1510610.643754] schedule+0x33/0xa0
[1510610.643757] schedule_timeout+0x152/0x300
[1510610.643761] ? __next_timer_interrupt+0xd0/0xd0
[1510610.643763] io_schedule_timeout+0x1e/0x50
[1510610.643772] __cv_timedwait_common+0x12f/0x170 [spl]
[1510610.643774] ? wait_woken+0x80/0x80
[1510610.643779] __cv_timedwait_io+0x19/0x20 [spl]
[1510610.643851] zio_wait+0x13a/0x280 [zfs]
[1510610.643902] dsl_pool_sync+0x46e/0x500 [zfs]
[1510610.643958] spa_sync+0x5b2/0xfc0 [zfs]
[1510610.644014] ? spa_txg_history_init_io+0x106/0x110 [zfs]
[1510610.644077] txg_sync_thread+0x2d9/0x4c0 [zfs]
[1510610.644156] ? txg_thread_exit.isra.12+0x60/0x60 [zfs]
[1510610.644163] thread_generic_wrapper+0x74/0x90 [spl]
[1510610.644167] kthread+0x120/0x140
[1510610.644173] ? __thread_exit+0x20/0x20 [spl]
[1510610.644175] ? kthread_park+0x90/0x90
[1510610.644178] ret_from_fork+0x35/0x40
[1510610.644182] INFO: task zed:2736 blocked for more than 120 seconds.
[1510610.644208] Tainted: P IOE 5.4.34-1-pve #1
[1510610.644231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1510610.644260] zed D 0 2736 1 0x00004000
[1510610.644262] Call Trace:
[1510610.644264] __schedule+0x2e6/0x700
[1510610.644266] schedule+0x33/0xa0
[1510610.644267] io_schedule+0x16/0x40
[1510610.644272] cv_wait_common+0xb5/0x130 [spl]
[1510610.644274] ? wait_woken+0x80/0x80
[1510610.644279] __cv_wait_io+0x18/0x20 [spl]
[1510610.644336] txg_wait_synced_impl+0xc9/0x110 [zfs]
[1510610.644393] txg_wait_synced+0x10/0x40 [zfs]
[1510610.644449] spa_vdev_state_exit+0x8a/0x160 [zfs]
[1510610.644505] vdev_online+0x2bb/0x3e0 [zfs]
[1510610.644564] zfs_ioc_vdev_set_state+0x86/0x190 [zfs]
[1510610.644622] zfsdev_ioctl+0x6db/0x8f0 [zfs]
[1510610.644626] ? lru_cache_add_active_or_unevictable+0x39/0xb0
[1510610.644629] do_vfs_ioctl+0xa9/0x640
[1510610.644632] ? handle_mm_fault+0xc9/0x1f0
[1510610.644634] ksys_ioctl+0x67/0x90
[1510610.644636] __x64_sys_ioctl+0x1a/0x20
[1510610.644639] do_syscall_64+0x57/0x190
[1510610.644642] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1510610.644644] RIP: 0033:0x7f9c54306427
[1510610.644649] Code: Bad RIP value.
[1510610.644650] RSP: 002b:00007f9c534a1078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1510610.644652] RAX: ffffffffffffffda RBX: 00007f9c534a10c0 RCX: 00007f9c54306427
[1510610.644653] RDX: 00007f9c534a10c0 RSI: 0000000000005a0d RDI: 0000000000000009
[1510610.644654] RBP: 00007f9c534a5ab0 R08: 00000000b5c9ecf4 R09: 00007f9c546bb8e6
[1510610.644655] R10: 00007f9c44041450 R11: 0000000000000246 R12: 00007f9c44015740
[1510610.644656] R13: 00007f9c4401e980 R14: 00007f9c534a6b40 R15: 00007f9c534a4670
[1510731.477176] INFO: task zed:2736 blocked for more than 241 seconds.
[1510731.477210] Tainted: P IOE 5.4.34-1-pve #1
[1510731.477232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

behlendorf added Bug - Minor labels Nov 12, 2014

behlendorf added this to the 0.6.4 milestone Nov 12, 2014

behlendorf mentioned this issue Feb 12, 2015

zpool clear hang #3082

Closed

behlendorf mentioned this issue Feb 27, 2015

Prevent "zpool destroy|export" when suspended #3140

Closed

behlendorf modified the milestones: 0.7.0, 0.6.4 Mar 2, 2015

behlendorf modified the milestones: 0.8.0, 0.7.0 Jul 15, 2016

behlendorf removed Bug - Minor labels Sep 30, 2016

This was referenced Oct 7, 2016

Unexpected uncorrectable I/O error suspends pool #4832

Closed

feature request: allow unloading a suspended pool from memory without exporting first #5242

Open

GregorKopka mentioned this issue Oct 31, 2016

'zfs list' hangs after a pool is suspended #5345

Closed

behlendorf mentioned this issue Jun 10, 2017

Add additional sanity check when resuming a pool #6212

Closed

12 tasks

zenaan mentioned this issue Sep 5, 2018

zpool replace fails when new dev name is same device - replugged external USB drive using /dev/sdX names cannot be detached/reattached #7866

Closed

behlendorf modified the milestones: 0.8.0, 0.9.0 Oct 12, 2018

behlendorf reopened this Mar 10, 2020

GregorKopka mentioned this issue Jul 21, 2020

Unable to export zpool after readonly import #10585

Open

GregorKopka mentioned this issue Mar 10, 2022

Regression: Temporary USB issues cause ZFS lockup until forced reboot, and sometimes data loss! #12007

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zpool destroy fails on zpool with suspended IO #2878

zpool destroy fails on zpool with suspended IO #2878

kiranfractal commented Nov 7, 2014

GregorKopka commented Nov 7, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 11, 2014

behlendorf commented Nov 12, 2014

kiranfractal commented Nov 13, 2014

behlendorf commented Nov 13, 2014

kiranfractal commented Nov 19, 2014

behlendorf commented Mar 2, 2015

darkpixel commented Dec 31, 2015

kernelOfTruth commented Dec 31, 2015

ashjas commented Jan 31, 2016

ilovezfs commented Jan 31, 2016

ashjas commented Feb 1, 2016

darkpixel commented Feb 1, 2016

GregorKopka commented Oct 12, 2016

bjquinn commented Oct 12, 2016

behlendorf commented Oct 12, 2016

mailinglists35 commented Oct 27, 2016

behlendorf commented Oct 27, 2016

lnxbil commented Feb 14, 2018

felisucoibi commented Jul 26, 2018 •

edited

Loading

GregorKopka commented Mar 9, 2020

behlendorf commented Mar 10, 2020

devZer0 commented May 31, 2020

zpool destroy fails on zpool with suspended IO #2878

zpool destroy fails on zpool with suspended IO #2878

Comments

kiranfractal commented Nov 7, 2014

Steps to reproduce:

GregorKopka commented Nov 7, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

kiranfractal commented Nov 10, 2014

GregorKopka commented Nov 11, 2014

behlendorf commented Nov 12, 2014

kiranfractal commented Nov 13, 2014

behlendorf commented Nov 13, 2014

kiranfractal commented Nov 19, 2014

behlendorf commented Mar 2, 2015

darkpixel commented Dec 31, 2015

kernelOfTruth commented Dec 31, 2015

ashjas commented Jan 31, 2016

ilovezfs commented Jan 31, 2016

ashjas commented Feb 1, 2016

darkpixel commented Feb 1, 2016

GregorKopka commented Oct 12, 2016

bjquinn commented Oct 12, 2016

behlendorf commented Oct 12, 2016

mailinglists35 commented Oct 27, 2016

behlendorf commented Oct 27, 2016

lnxbil commented Feb 14, 2018

felisucoibi commented Jul 26, 2018 • edited Loading

GregorKopka commented Mar 9, 2020

behlendorf commented Mar 10, 2020

devZer0 commented May 31, 2020

felisucoibi commented Jul 26, 2018 •

edited

Loading