Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool destroy fails on zpool with suspended IO #2878

Open
kiranfractal opened this issue Nov 7, 2014 · 36 comments
Open

zpool destroy fails on zpool with suspended IO #2878

kiranfractal opened this issue Nov 7, 2014 · 36 comments
Labels
Status: Inactive Not being actively updated Status: Understood The root cause of the issue is known
Milestone

Comments

@kiranfractal
Copy link

Hi,

I am trying disk replacement from a single disk zpool and it results to suspended IO and does not allow me to destroy the zpool.

Steps to reproduce:

  1. create a zpool with single disk (zpool create zp1 /dev/sda)
  2. remove the disk
  3. zpool status shows disk unavailable
  4. insert a new disk
  5. zpool replace zp1 /dev/sda /dev/sdb
  6. it says "cannot replace /dev/sda with /dev/sdb : pool I/O is currently suspended"

The above steps holds good for zpool destroy as well.

Now the zpool is unusable and only option for me to destroy zpool is to reboot the system and destroy the old pool and create a new pool.

Is there any other option can I try so that I can replace the disk without rebooting the system ?

Thanks,
Kiran.

@GregorKopka
Copy link
Contributor

Maybe reattach the original disk and try zpool clear or zpool online ?

@kiranfractal
Copy link
Author

I agree with you but I just want to remove a old disk and insert new disk in the same slot and recreate a zpool without reboot.

@GregorKopka
Copy link
Contributor

Then try -f flags on zpool export or destroy.

@kiranfractal
Copy link
Author

I removed the disk and inserted new disk tried your command but they show below messages,

[root@fractal-C92E ~]# zpool status <=== removed drive from the slot and added new drive in the same slot
pool: zp2
state: ONLINE
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
zp2         ONLINE       0     0     0
  sdb       ONLINE       0     0     0

errors: No known data errors

==> Write some data to the zpool ie zp2

[root@fractal-C92E ~]# zpool clear zp2

[root@fractal-C92E ~]# zpool status
pool: zp2
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-JQ
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
zp2         ONLINE       0     4     0
  sdb       ONLINE       3    14     0

errors: 2 data errors, use '-v' for a list

[root@fractal-C92E ~]# zpool export -f zp2
umount2: Device or resource busy
umount: /zp2: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
cannot unmount '/zp2': umount failed

[root@fractal-C92E ~]# zpool destroy zp2
umount: /zp2: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
cannot unmount '/zp2': umount failed
could not destroy 'zp2': could not unmount datasets

Now only option is to reboot to recreate the zpool.

Let me know to try any further commands.

Thanks.

@kiranfractal
Copy link
Author

[root@fractal-C92E ~]# zpool replace -f zp2 /dev/sdb /dev/disk/by-id/scsi-SATA_ST3750640NS_3QD1GN87
cannot replace /dev/sdb with /dev/disk/by-id/scsi-SATA_ST3750640NS_3QD1GN87: pool I/O is currently suspended

@GregorKopka
Copy link
Contributor

Put the old (original) disk in the system so it is attached as the device which ZFS thinks is the member of the pool (before your last replace that was /dev/sdb), then
zpool clear zp2
which should bring the system to a state where you are be able access the pool again - including the option to destroy it.

Simple solution: reboot.

One thing to keep in mind: replacing a non-redundant vdev of a pool needs to be done while the data on it is still available - else the data can't be copied, like in your example where you pulled the drive.

@kiranfractal
Copy link
Author

Our use case is we don't use ZFS raid but a single drive per pool, since replication factor is taken care at gluster volume.

How about an option to support forced destroy (to destroy when pool I/O is currently suspended ) so that this corner case can be addressed ?

@kiranfractal
Copy link
Author

We are looking for graceful replacement of drive without reboot. Is there any workaround ?

@GregorKopka
Copy link
Contributor

@kiranfractal If you plan to create redundancy only through the use of replication in gluster then you should be able to reboot the box without any problems at any point in time.

For replacing see man zpool for useage and limitations.

Note that your use case (live pool with non-redundant top level vdev plus the disk backing that vdev being offline) is afaik currently not supported apart from reconnecting the original disk (including the data on it) to continue.

Destroying the redundancy of a pool below one good copy is a goto to restore from backup, per definition this is offline time. At the moment i can only suggest for production to either change your setup (create pools with redundancy, like zfs is intended to be used if you care about your data being healthy and online on that specific machine) or use a simple throwaway posix filesystem to be bricked as storage backend for gluster (which might nevertheless block in case you remove the disk backing the filesystem prior to unmounting).

Only question left:
@behlendorf To be able to export/destroy a (even suspended) imported pool - even if that might cause data loss, even on-disk - might be useful, for example in case if an USB backup drive containing such a simple pool goes south.

Are there plans for ZoL to support this?

(edit: accidental premature send)

@behlendorf
Copy link
Contributor

There's definitely some work to be done here, and coincidentally enough just yesterday I opened a pull request, #2890, with a few bug fixes along these lines.

Most of the infrastructure to force export/destroy a pool is already in place but there are still some gotchas which need to be sorted out. For example, one major restriction imposed by the Linux VFS we'll have to contend with is that a filesystem cannot be unmounted if it has open file handles. And any mounted filesystem will hold references on the pool which will prevent us from destroying it.

So if the administration kills off all the processes with open file handles then the filesystem should be unmountable even when the pool is suspended. Additionally, if the -F hardforce option is given you should be able to export/destroy even a suspended pool. At the moment -F is undocumented but it does exist.

What probably needs to happen is for someone to take a moment and work through all of these cases is wee what works currently and what doesn't.

@behlendorf behlendorf added this to the 0.6.4 milestone Nov 12, 2014
@kiranfractal
Copy link
Author

I tried with -F option to export and it is hanging as below.

  1. I created a zpool with single disk

  2. copied some data to it

  3. Removed the disk and cleared zpool (zpool clear mnt)

  4. zpool status

pool: mnt
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
mnt         UNAVAIL      0     0     0  insufficient replicas
  sdc       UNAVAIL      0     0     0
  1. [root@fractal-c92e ~]# zpool export -F mnt

  2. [root@fractal-c92e ~]# ps aux | grep zpool
    root 2413 0.0 0.0 126064 1524 pts/0 D+ 11:17 0:00 zpool export -F mnt

dmesg output:
---SPLError: 1177:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'mnt' has encountered an uncorrectable I/O failure and has been suspended.

SPLError: 1176:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'mnt' has encountered an uncorrectable I/O failure and has been suspended.

sysdig_probe: driver loading
sysdig_probe: initializing ring buffer for CPU 0
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 1
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 2
sysdig_probe: CPU buffer initialized, size=1048576
sysdig_probe: initializing ring buffer for CPU 3
sysdig_probe: CPU buffer initialized, size=1048576
INFO: task txg_sync:1244 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000002 0 1244 2 0x00000000
ffff88027257fb70 0000000000000046 ffff88027257fb80 ffff880273010918
0000000000000001 ffff880273010930 0000000000000000 0000000000000000
ffff88027257faf0 ffffffff81064ba2 ffff88027aaa5ab8 ffff88027257ffd8
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? ktime_get_ts+0xb1/0xf0
[] io_schedule+0x73/0xc0
[] cv_wait_common+0xac/0x1c0 [spl]
[] ? dmu_objset_write_ready+0x0/0x50 [zfs]
[] ? zio_execute+0x0/0x140 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait_io+0x18/0x20 [spl]
[] zio_wait+0xfb/0x1b0 [zfs]
[] dsl_pool_sync+0x2b3/0x440 [zfs]
[] spa_sync+0x40b/0xae0 [zfs]
[] txg_sync_thread+0x384/0x5e0 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x5e0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x9e/0xc0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xc0
[] ? child_rip+0x0/0x20
INFO: task zpool:2413 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool D 0000000000000001 0 2413 2395 0x10000080
ffff88027af01ad8 0000000000000082 0000000000000000 ffff88027b480080
0000000000000055 ffff880277968240 0000004dedc3628a 0000000000000282
ffff88027af01b78 00000001000085cc ffff88027aade638 ffff88027af01fd8
Call Trace:
[] cv_wait_common+0x105/0x1c0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0xb3/0x190 [zfs]
[] dmu_tx_wait+0x1b4/0x2b0 [zfs]
[] dmu_tx_assign+0x91/0x490 [zfs]
[] ? read_tsc+0x9/0x20
[] spa_history_log_nvl+0x7d/0x160 [zfs]
[] ? nvlist_add_string+0x1b/0x20 [znvpair]
[] spa_history_log+0x58/0xc0 [zfs]
[] zfs_log_history+0xfc/0x100 [zfs]
[] zfs_ioc_pool_export+0x2f/0x60 [zfs]
[] zfsdev_ioctl+0x45c/0x4d0 [zfs]
[] ? __do_page_fault+0x1ec/0x480
[] vfs_ioctl+0x22/0xa0
[] do_vfs_ioctl+0x84/0x580
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b
INFO: task txg_sync:1244 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000002 0 1244 2 0x00000000
ffff88027257fb70 0000000000000046 ffff88027257fb80 ffff880273010918
0000000000000001 ffff880273010930 0000000000000000 0000000000000000
ffff88027257faf0 ffffffff81064ba2 ffff88027aaa5ab8 ffff88027257ffd8
Call Trace:
[] ? default_wake_function+0x12/0x20
[] ? ktime_get_ts+0xb1/0xf0
[] io_schedule+0x73/0xc0
[] cv_wait_common+0xac/0x1c0 [spl]
[] ? dmu_objset_write_ready+0x0/0x50 [zfs]
[] ? zio_execute+0x0/0x140 [zfs]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait_io+0x18/0x20 [spl]
[] zio_wait+0xfb/0x1b0 [zfs]
[] dsl_pool_sync+0x2b3/0x440 [zfs]
[] spa_sync+0x40b/0xae0 [zfs]
[] txg_sync_thread+0x384/0x5e0 [zfs]
[] ? set_user_nice+0xc9/0x130
[] ? txg_sync_thread+0x0/0x5e0 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x9e/0xc0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xc0
[] ? child_rip+0x0/0x20
INFO: task zpool:2413 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool D 0000000000000001 0 2413 2395 0x00000080
ffff88027af01ad8 0000000000000082 0000000000000000 ffff88027b480080
0000000000000055 ffff880277968240 0000004dedc3628a 0000000000000282
ffff88027af01b78 00000001000085cc ffff88027aade638 ffff88027af01fd8
Call Trace:
[] cv_wait_common+0x105/0x1c0 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0xb3/0x190 [zfs]
[] dmu_tx_wait+0x1b4/0x2b0 [zfs]
[] dmu_tx_assign+0x91/0x490 [zfs]
[] ? read_tsc+0x9/0x20
[] spa_history_log_nvl+0x7d/0x160 [zfs]
[] ? nvlist_add_string+0x1b/0x20 [znvpair]
[] spa_history_log+0x58/0xc0 [zfs]
[] zfs_log_history+0xfc/0x100 [zfs]
[] zfs_ioc_pool_export+0x2f/0x60 [zfs]
[] zfsdev_ioctl+0x45c/0x4d0 [zfs]
[] ? __do_page_fault+0x1ec/0x480
[] vfs_ioctl+0x22/0xa0
[] do_vfs_ioctl+0x84/0x580
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor

Thanks for the additional test case. In this case it's blocking trying to update the history which clearly won't work. This is a slight variation on the other issue.

Let me ask an opened ended question. For a pool which is suspended what would you expect the following behavior to be for the following commands. Where -f mean force and -F means hardforce.

  1. zpool export|destroy < pool >
  2. zpool export|destroy -f < pool >
  3. zpool export|destroy -F < pool >

@kiranfractal
Copy link
Author

Let me start with a disclaimer that I have not started to look at the inside workings of ZFS. However, we rely heavily on ZFS to provide ondisk data consistency without interruptions to data access (caused by say a reboot). I'm coming at this from the situation of recovering gracefully from an unrecoverable failure in the underlying drives. This could be either in the situation when we have a non-redundant pool (say a pool per disk) or a redundant pool where the failure tolerances have been exceeded (say a raidz1 with a 2 disk failure). In these situations, I'd ideally like to recover without the need for a system reboot which will cause other services to also be affected. I'm not so worried about the loss of data caused by the failed drive because I can handle that at a higher layer.

That said, here are some tentative answers for the questions given above, specifically for a destroy :

  1. Should succeed if there are no underlying causes to fail and should exit with an error for all other conditions.
  2. Forcefully unmount before the destroy.
  3. The operation should succeed regardless of what happens.

When is the tentative date for the zfs-0.6.4 release ?

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Dec 26, 2014
openzfs#2890

openzfs#2878

Hi,

I am trying disk replacement from a single disk zpool and it results to suspended IO and does not allow me to destroy the zpool.

Steps to reproduce:

create a zpool with single disk (zpool create zp1 /dev/sda)
remove the disk
zpool status shows disk unavailable
insert a new disk
zpool replace zp1 /dev/sda /dev/sdb
it says "cannot replace /dev/sda with /dev/sdb : pool I/O is currently suspended"
The above steps holds good for zpool destroy as well.

Now the zpool is unusable and only option for me to destroy zpool is to reboot the system and destroy the old pool and create a new pool.

Is there any other option can I try so that I can replace the disk without rebooting the system ?

Thanks,
Kiran.
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 27, 2015
Cleanly destroying or exporting a pool requires that the pool
not be suspended.  Therefore, set the POOL_CHECK_SUSPENDED flag
for these ioctls so the utilities will output a descriptive
error message rather than block.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2878
behlendorf added a commit that referenced this issue Mar 2, 2015
Cleanly destroying or exporting a pool requires that the pool
not be suspended.  Therefore, set the POOL_CHECK_SUSPENDED flag
for these ioctls so the utilities will output a descriptive
error message rather than block.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2878
@behlendorf
Copy link
Contributor

I've merged 87a63dd to master which ensures a reasonable error is generated when attempting to destroy or export a suspended pool. This is far preferable to a hang. However, I'd still like to leave this issue open so we can explore what is reasonable behavior for a force option.

87a63dd Prevent "zpool destroy|export" when suspended

@behlendorf behlendorf modified the milestones: 0.7.0, 0.6.4 Mar 2, 2015
@darkpixel
Copy link

We are trying to use ZFS for the exact same scenario. Our clients have a pool of spinning rust they use for primary data storage. We zfs send that data to two places--an off-site backup, and a local single external USB drive.

Basically they want fast (local) recovery options in case somehow the primary data storage dies from the local USB drive. And they want an off-site backup in case the office burns down.

We have external USB drives (Seagate) drives at 24 locations and it seems like one dies every month. When they die, we ship a new drive out, get it plugged in, and then have to reboot the box because we can't destroy 'backup-pool' because IO is suspended. Once the box has been rebooted 'backup-pool' no longer shows up, then we can create a new 'backup-pool' from the newly installed drive.

I agree with @kiranfractal that there should be a '-F' option (in addition to '-f') that would force removal of the pool regardless of IO being suspended.

@kernelOfTruth
Copy link
Contributor

referencing

https://www.illumos.org/issues/4128 disks in zpools never go away when pulled
illumos/illumos-gate@39cddb1

@ashjas
Copy link

ashjas commented Jan 31, 2016

This same issue i am facing in such a scenario:

  1. sudo zpool create mypool -m ~/ZFS mirror /dev/sdc1 /dev/sdc2
    I am using mirror on a single disk... i know this is inefficient.. but im testing...

  2. sudo zfs mount / umount + sudo zpool import/export works as expected.

  3. reboot

  4. put in a flash drive, so that my zpool device become as : /dev/sdd1 & /dev/sdd2. // sdc1 is now for the inserted flashdrive.

  5. neither sudo zfs mount / umount nor sudo zpool import/export works as expected.

----------> output of commands issued:

sudo zpool status -v
  pool: mypool
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jan 31 19:12:37 2016
config:

    NAME                                                 STATE     READ WRITE CKSUM
    mypool                                               ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE       0     0     0

errors: No known data errors
sudo zpool import
no pools available to import

sudo zpool import mypool
cannot import 'mypool': a pool with that name already exists
use the form 'zpool import <pool | id> <newpool>' to give it a new name
sudo zpool export mypool
umount: /home/ashish/ZFS: not mounted
cannot unmount '/home/ashish/ZFS': umount failed
sudo zfs mount mypool
cannot mount 'mypool': filesystem already mounted
sudo zfs get all mypool | grep mount
mypool  mounted               yes                    -
mypool  mountpoint            /home/ashish/ZFS       local
mypool  canmount              on                     default

Surprisingly, issuing a scrub command:
sudo zpool scrub mypool

correctly tries to scan the Usb HDD, as i can see its LED blinking.

Also, if in this booted session of ubunt, if i fix the path back to /dev/sdc1 /dev/sdc2 for the zpools, by removing the flash drive , and reattaching the zpool drive, the above does not change.. the issue still remains. This only gets fixed if i reboot the system.

dmesg.log


Edit : In a different fresh rebooted session, with /dev/sdc1,2

sudo zpool import
no pools available to import

But :
sudo zfs mount -a
Mounts the zpool correctly.

sudo zpool status
  pool: mypool
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jan 31 19:18:44 2016
config:

    NAME                                                 STATE     READ WRITE CKSUM
    mypool                                               ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE       0     0     0
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE       0     0     0

errors: No known data errors

thereafter after doing an export,
sudo zpool export mypool
import command works fine now:

sudo zpool import 
   pool: mypool
     id: 977336718416363205
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    mypool                                               ONLINE
      mirror-0                                           ONLINE
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part1  ONLINE
        ata-WDC_WD1600BEVS-00UST0_WD-WXEY07V22050-part2  ONLINE

after adding some files to the zpool,
export misbehaves:
sudo zpool export mypool 
umount: /home/ashish/ZFS: target is busy
        (In some cases useful info about processes that
         use the device is found by lsof(8) or fuser(1).)
cannot unmount '/home/ashish/ZFS': umount failed

lsof ZFS/
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
ibus-daem 7514 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-dcon 7524 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-ui-g 7525 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-x11  7527 ashish  cwd    DIR   0,36        6    4 ZFS
ibus-engi 7538 ashish  cwd    DIR   0,36        6    4 ZFS

fuser ZFS/
/home/ashish/ZFS:     7514c  7524c  7525c  7527c  7538c

this time dmesg2.log

There is some bug.. let me know if i can be of some other help...

Hope this gets sorted...

@ilovezfs
Copy link
Contributor

@ashjas I'm not seeing anything wrong or unexpected there, and certainly nothing having to do with device names. The only thing that looks like a problem is

cannot unmount '/home/ashish/ZFS': umount failed

which typically indicates that that particular dataset is busy. Often that can be resolved with -f or closing whatever files are opened. lsof is your friend.

@ashjas
Copy link

ashjas commented Feb 1, 2016

@ilovezfs
For the first scenario where the device name get changed...
5) neither sudo zfs mount / umount nor sudo zpool import/export works as expected.

This is not an issue? I'm unable to mount the volume.
If I'm giving wrong commands ,let me know. But how this isn't an issue is out of my head.

Secondly -f doesn't make any difference at all!

@darkpixel
Copy link

@ashjas: I don't create pools like this: sudo zpool create mypool -m ~/ZFS mirror /dev/sdc1 /dev/sdc2

I don't know about how ZFS handles that, but /dev/sda, /dev/sdb, etc... can change when devices are changed.

I usually create the pools by using devices in /dev/disk/by-id/*

Those ID numbers are not supposed to change, even if devices are switched around.

@GregorKopka
Copy link
Contributor

@behlendorf Please reconsider stuck zpool/zfs command / inability to remove a pool from the system not being a bug.

This scenario isn't that uncommon with USB-disk based pools used for backups, 'reboot to fix it' should keep to be a speciality of windows...

@bjquinn
Copy link

bjquinn commented Oct 12, 2016

Agreed. USB pools for backups are not reliably usable given this issue.

@behlendorf
Copy link
Contributor

@GregorKopka I agree. There's lots of room for improvement when someone has the time to focus on this. I've just removed the Bug tag from all issues because it wasn't actually helpful.

@mailinglists35
Copy link

@behlendorf is there any hope for someone to implement this anytime soon?

@behlendorf
Copy link
Contributor

No developers I know of are currently working on this.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 9, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 16, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 28, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Add a basic zpool_tryimport function which can be used by zhack,
zdb, and ztest to provide minimum pool import functionality.
This way each utility isn't doing a slightly different thing.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 1, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Add a basic zpool_tryimport function which can be used by zhack,
zdb, and ztest to provide minimum pool import functionality.
This way each utility isn't doing a slightly different thing.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 3, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Add a basic zpool_tryimport function which can be used by zhack,
zdb, and ztest to provide minimum pool import functionality.
This way each utility isn't doing a slightly different thing.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 3, 2017
A pool may only be resumed when the txg in the "best" uberblock
found on-disk matches the in-core last synced txg.  This is done
to verify that the pool was not modified by another node or process
while it was suspended.  If this were to happen the result would
be a corrupted pool.

Since a suspended pool may no longer always be resumable it was
necessary to extend the 'zpool export -F` command to allow a
suspended pool to be exported.  This was accomplished by leveraging
the existing spa freeze functionality.  During export if '-F' is
given and the pool is suspended the pool will be frozen at the last
synced txg and all in-core dirty data will be discarded.  This
allows for the pool to be safely exported without having to reboot
the system.

In order to test this functionality the broken 'ztest -E' option,
which allows for ztest to use an existing pool, was fixed.  The
code needed for this was copied over from zdb.  ztest is used to
modify the test pool from user space while the kernel has the pool
imported and suspended.

This commit partially addresses issues openzfs#4003, openzfs#2023, openzfs#2878, openzfs#3256
by allowing a suspended pool to be exported, 'zpool export -F'.
There may still be cases where a reference on the pool, such as
a filesystem which cannot be unmounted, will prevent the pool
from being exported.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@lnxbil
Copy link

lnxbil commented Feb 14, 2018

I just ran into the same problem after accidentially removing the only disk the pool consists of via a echo 1 > /sys/block/sdr/delete call and I was (after a lot of different tries) able to reimport the pool after reading this thread:

$ zpool status -v externes-backup
  pool: externes-backup
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: scrub repaired 0B in 29h48m with 0 errors on Thu Aug  4 20:51:21 2016
config:

        NAME                     STATE     READ WRITE CKSUM
        externes-backup          ONLINE       1     9     0
          externes_backup_crypt  ONLINE       1     0     0

errors: List of errors unavailable: pool I/O is currently suspended

Just for others that might run into the same problem:

  • I have/had a luks-crypted device called externes_backup_crypt

  • Closing luks device (to be able to have the same name again, otherwise it yields no such device in pool)

      $ cryptsetup luksClose externes_backup_crypt
    
  • Rescan SCSI-Bus

      $ for bus in /sys/class/scsi_host/host*; do echo "- - -" > $bus/scan; done
    
  • Decrypt device

      $ cryptsetup luksOpen /dev/sds externes_backup_crypt
    
  • put disk online

      $ zpool online externes-backup  /dev/mapper/externes_backup_crypt
    

    cannot online /dev/mapper/externes_backup_crypt: pool I/O is currently suspended

  • not, you have to clear the pool and it'll continue to work again:

      $ zpool clear externes-backup
    
  • Afterwards, the pool is imported and automatically scrubbed:

$ root@backup ~ > zpool status -v externes-backup
  pool: externes-backup
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Feb 14 10:32:32 2018
        4,87G scanned out of 6,14T at 12,1M/s, 147h47m to go
        0B repaired, 0,08% done
config:

        NAME                     STATE     READ WRITE CKSUM
        externes-backup          ONLINE       0     0     0
          externes_backup_crypt  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x1>
        <metadata>:<0x18675>
        <metadata>:<0x3dca1>

Now, I'm at least able to export the pool correctly.

@felisucoibi
Copy link

felisucoibi commented Jul 26, 2018

@lnxbil thanks, at least we have something,i have the sameproblem, using LUKS+ZFS and when something goes wrong i have to restart the server, the problem comes when i'm not at home and i cant reboot. The pitty is when i use cryptsetup luksClose RAID1 it says the device is busy.... :(

@GregorKopka
Copy link
Contributor

@behlendorf this issue was closed without a who and when - is that supposed to happen?

There (AFAIK) still isn't a solution to this (only workarounds for a subset of the problem), so the defect remains. Also quite a lot of information here that would certainly be interesting to someone who decides to work on this at some point in the future.

Please repoen.

@behlendorf
Copy link
Contributor

It's not clear to me exactly how this was closed. I don't have any objection to reopening it since this is still and issue.

@behlendorf behlendorf reopened this Mar 10, 2020
@devZer0
Copy link

devZer0 commented May 31, 2020

this is still an issue in 2020

i had some flaky disks in a server and wanted to remove those, i'm somewhat sure i did zpool export or destroy before removing those 3 disks, but i must have done wrong, at least the pool was online and mounted while the disks being ripped off (didn't double check before) - now the pool it's in suspended state and apparently , it's not possible to destroy/remove from the system when in this state.

as there are VMs online on this system, i'm rather curious what to do without shutting them all down and reboot....

i'm getting this in dmesg after another "zpool destroy..." hung for a while....

[1510610.643638] INFO: task txg_sync:1714 blocked for more than 120 seconds.
[1510610.643684] Tainted: P IOE 5.4.34-1-pve #1
[1510610.643711] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1510610.643741] txg_sync D 0 1714 2 0x80004000
[1510610.643744] Call Trace:
[1510610.643752] __schedule+0x2e6/0x700
[1510610.643754] schedule+0x33/0xa0
[1510610.643757] schedule_timeout+0x152/0x300
[1510610.643761] ? __next_timer_interrupt+0xd0/0xd0
[1510610.643763] io_schedule_timeout+0x1e/0x50
[1510610.643772] __cv_timedwait_common+0x12f/0x170 [spl]
[1510610.643774] ? wait_woken+0x80/0x80
[1510610.643779] __cv_timedwait_io+0x19/0x20 [spl]
[1510610.643851] zio_wait+0x13a/0x280 [zfs]
[1510610.643902] dsl_pool_sync+0x46e/0x500 [zfs]
[1510610.643958] spa_sync+0x5b2/0xfc0 [zfs]
[1510610.644014] ? spa_txg_history_init_io+0x106/0x110 [zfs]
[1510610.644077] txg_sync_thread+0x2d9/0x4c0 [zfs]
[1510610.644156] ? txg_thread_exit.isra.12+0x60/0x60 [zfs]
[1510610.644163] thread_generic_wrapper+0x74/0x90 [spl]
[1510610.644167] kthread+0x120/0x140
[1510610.644173] ? __thread_exit+0x20/0x20 [spl]
[1510610.644175] ? kthread_park+0x90/0x90
[1510610.644178] ret_from_fork+0x35/0x40
[1510610.644182] INFO: task zed:2736 blocked for more than 120 seconds.
[1510610.644208] Tainted: P IOE 5.4.34-1-pve #1
[1510610.644231] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1510610.644260] zed D 0 2736 1 0x00004000
[1510610.644262] Call Trace:
[1510610.644264] __schedule+0x2e6/0x700
[1510610.644266] schedule+0x33/0xa0
[1510610.644267] io_schedule+0x16/0x40
[1510610.644272] cv_wait_common+0xb5/0x130 [spl]
[1510610.644274] ? wait_woken+0x80/0x80
[1510610.644279] __cv_wait_io+0x18/0x20 [spl]
[1510610.644336] txg_wait_synced_impl+0xc9/0x110 [zfs]
[1510610.644393] txg_wait_synced+0x10/0x40 [zfs]
[1510610.644449] spa_vdev_state_exit+0x8a/0x160 [zfs]
[1510610.644505] vdev_online+0x2bb/0x3e0 [zfs]
[1510610.644564] zfs_ioc_vdev_set_state+0x86/0x190 [zfs]
[1510610.644622] zfsdev_ioctl+0x6db/0x8f0 [zfs]
[1510610.644626] ? lru_cache_add_active_or_unevictable+0x39/0xb0
[1510610.644629] do_vfs_ioctl+0xa9/0x640
[1510610.644632] ? handle_mm_fault+0xc9/0x1f0
[1510610.644634] ksys_ioctl+0x67/0x90
[1510610.644636] __x64_sys_ioctl+0x1a/0x20
[1510610.644639] do_syscall_64+0x57/0x190
[1510610.644642] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1510610.644644] RIP: 0033:0x7f9c54306427
[1510610.644649] Code: Bad RIP value.
[1510610.644650] RSP: 002b:00007f9c534a1078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1510610.644652] RAX: ffffffffffffffda RBX: 00007f9c534a10c0 RCX: 00007f9c54306427
[1510610.644653] RDX: 00007f9c534a10c0 RSI: 0000000000005a0d RDI: 0000000000000009
[1510610.644654] RBP: 00007f9c534a5ab0 R08: 00000000b5c9ecf4 R09: 00007f9c546bb8e6
[1510610.644655] R10: 00007f9c44041450 R11: 0000000000000246 R12: 00007f9c44015740
[1510610.644656] R13: 00007f9c4401e980 R14: 00007f9c534a6b40 R15: 00007f9c534a4670
[1510731.477176] INFO: task zed:2736 blocked for more than 241 seconds.
[1510731.477210] Tainted: P IOE 5.4.34-1-pve #1
[1510731.477232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Status: Understood The root cause of the issue is known
Projects
None yet
Development

No branches or pull requests

14 participants