Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool commands block when a disk goes missing / pool suspends #3461

Open
gordan-bobic opened this issue May 30, 2015 · 47 comments · May be fixed by #11082
Open

zpool commands block when a disk goes missing / pool suspends #3461

gordan-bobic opened this issue May 30, 2015 · 47 comments · May be fixed by #11082
Labels
Bot: Not Stale Override for the stale bot Status: Blocked Depends on another pending change Status: Inactive Not being actively updated Status: Understood The root cause of the issue is known

Comments

@gordan-bobic
Copy link
Contributor

It would appear that pulling a disk from a single disk pool causes ZoL to get into a state where all zpool commands (e.g. zpool list) to block. sync also blocks indefinitely. Both become uninterruptable (kill -9 doesn't work).

No other errors in dmesg other than the disk getting disconnected (I removed it) and:

WARNING: Pool 'poolname' has encountered an uncorrectable I/O failure and has been suspended.

There need to be timeouts and the failure handling of this scenario needs to be more graceful than requiring reboot of the machine.

@DeHackEd
Copy link
Contributor

This isn't a case of a time out. ZFS knows the disk is gone. This is a deliberate choice by the ZFS developers. If your pool can't survive due to redundancy failures it enters a suspended state to allow the administrator the ability to fix it while dirty data, etc are still in RAM. zpool set failmode=continue poolname allows some operations to fail with IO errors rather than jumping to suspended mode ASAP, but some actions will still suspend the pool.

What might make sense is modifying some zpool tools to be aware of suspended pools and avoid querying data that may require disk IO to find.

Here's a sample pool I faulted using blkdiscard on an SSD with failmode=continue

  pool: testpool
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-JQ
  scan: none requested
config:
    NAME        STATE     READ WRITE CKSUM
    testpool    UNAVAIL      0     0     4  insufficient replicas
      vdb1      UNAVAIL      0     0    16  corrupted data
# zpool list
NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool  49.8G  10.1M  49.7G         -     0%     0%  1.00x  UNAVAIL  -

@gordan-bobic
Copy link
Contributor Author

So why do sync and zpool list also hang forever? There are other pools in the machine, and it is a zfs-only machine. Hanging until the machine is rebooted just seems downright silly. For this behaviour to make any sense there has to be a way to either tell ZFS to give up on a pool and/or unsuspend the pool.

@DeHackEd
Copy link
Contributor

This is intended behaviour to safeguard your dirty data and allow an administrator the chance to fix a broken machine while hot data is still available. The sync can't complete because there is insufficient disks to write out the transaction. zpool list hanging could be interpreted as a bug - unknown field values could be replaced with just a - dash and only display pool name and health (FAULTED, UNAVAIL, etc).

If you can fix the problem, ZFS will pick up where it left off and unsuspend.

@gordan-bobic
Copy link
Contributor Author

Except that when it all hangs it is not possible at that point to
zpool set failmode=continue poolname

There really needs to be a way for the administrator to say "fine, go fail" without requiring a complete reboot - especially when that reboot actually requires a hard-reset because the shutdown sequence will also hang when trying to cleanly export the pools.

@behlendorf
Copy link
Contributor

@gordan-bobic the situation should be improved somewhat in the 0.6.4.1 tag. At a minimum command which do not require disk IO should be allowed to complete.

For example, commands like zpool list which should not have been impacted but were because parts of the output required accessing the disk. The very disk which was no longer available. Specifically this was a regression which crept in with feature flag support, see 417104b for details. However, command list zfs list potentially won't because some dataset needs may need to be read from disk.

The fact that you can't always CTRL-C the command does definitely sound like a bug.

@GregorKopka
Copy link
Contributor

@behlendorf imho that zfs list could block is a bug: it is read-only, and fundamental to zfs.

Data that is needed for zfs list should be always be in ARC and stay there till the pool is exported.

Ideal would be that the metadata needed for this is loaded on pool import and never ever be released (at least an option to configure such behaviour should be available).

More on topic: Block operations that require r/w access to a suspended pool (apart from listing it) seems reasonable, nevertheless they should always be abortable by a signal and clean up correctly so that they don't leave dangling locks. Blocking operations on healthy pools just because another pool in the system failed and/or reaching a system state that can only be cured by a reboot is imho a bug,so there should be a way to cleanly (and completely, so a reimport would be possible) remove a suspended pool from the system.

@behlendorf
Copy link
Contributor

@GregorKopka regarding zfs list that's potentially a lot of data. There could be 100,000's of datasets when you start including snapshots. Plus you'd need to store the properties for all of them, it has the potential to consume a significant chunk of memory. I could see this potentially being a configurable thing.

On the other points I generally agree. But the devils always in the details for these things so someone will need to investigate why it is the way it is.

@jerome-diver
Copy link

i confirm, when an usb device miss (due to hard un-connection), services freeze, zfs and zpool are no more been usable (from antergos zfs packages, but i think also from others), and CTRL+C or CTRL+Z not respond. Ugly... need to reboot... at this time and because of this, ZFS is not a stable/strong file system format and could not safety be use with external hard drive.

@mailinglists35
Copy link

duplicate of #3256 ?

@jerome-diver
Copy link

jerome-diver commented Jul 6, 2016

not really, this subject is specific to 'disk removed (without to be wanted for)'.
ZFS dev team need to have consideration about material reality that can failed or be removed on this real factual world, that's real situations that does happened by the way to everybody. Close it and target on the other subject is like to try to not see the issue origin.
A disk that has been unconnected, has not to be a crucial problem that can make all your data loosed, because the world is not perfect and ZFS has to works in this real world (and sure... it is not so simple to look on this way because reality is not theorical things). Also, many other file system take care about this reality.
i think this post would be close when ZFS would be able to be stable and strong on this real world situations, and specifically when a drive has been hard unconnected.

@behlendorf behlendorf added this to the 0.8.0 milestone Jul 11, 2016
@behlendorf behlendorf added the Bug label Jul 11, 2016
@behlendorf
Copy link
Contributor

@jerome-diver you can set the failmode=continue zpool property to prevent the pool from suspending when the drive is hard removed from a non-redundant configuration. This will result in errors to the applications but it should not hang the system. This behavior is similar to what you'd get from other filesystems.

There is some related work under way to better be able to detect when a drive was removed and if it's readded to the system what the new device name is.

@mailinglists35
Copy link

@DeHackEd

This is intended behaviour to safeguard your dirty data and allow an administrator the chance to fix a broken machine while hot data is still available.

how do you fix the broken machine when all you want is to force export/clear the pool that has experienced the errors? without rebooting the machine and without affecting other pools. you can't even rmmod -f the zfs modules, once a pool goes in suspended state, it never gets out of suspended state unless you reboot the machine.

@behlendorf Jun 5, 2015

But the devils always in the details for these things so someone will need to investigate why it is the way it is.

It's been a year and a half, is there any progress on someone investigating? Is there any hope anytime soon zfs will allow exporting/clearing/removing from memory a pool that is in suspended state (issue #5242)?

@behlendorf

you can set the failmode=continue zpool property to prevent the pool from suspending when the drive is hard removed from a non-redundant configuration.

I have already set the failmode=continue on a single disk pool over iscsi and yet the pool is hung and the only way to get out of this is a reboot, even if the drive came back online, see #3256.
additionally, the zpool status message shows this url which has a 404 http://zfsonlinux.org/msg/ZFS-8000-JQ).

This behavior is similar to what you'd get from other filesystems.

no, it's not similar. on ext4 I can unmount the affected device then fsck then remount the filesystem once the device is reconnected to the system. on zfs, the pool remains hung forever.

There is some related work under way to better be able to detect when a drive was removed and if it's readded to the system what the new device name is.

are you referring to #5343 ? if yes, this will allow unsuspending pools if the disk comes back online?

@gordan-bobic
Copy link
Contributor Author

There really needs to be a way to instruct ZFS to throw away any and all dirty data and forget that the pool was ever here without rebooting the machine. Leaving a pool in a hung state with the disk removed is of no practical use. If there is risk of trashing the pool, so be it, but that risk doesn't seem any different from what happens if you reboot the machine, which is currently the only option anyway.

@DurvalMenezes
Copy link

This is a serious problem here, where we operate on the same machine its normal pools (on internal SATA drives) and backup/archival pools which reside on external HDDs connected to the machine via USB3. Everytime we have a USB3 connection "flicker" (which is frequent as the connectors are not as reliable as we want) its pool is suspended, and all other pools start experimenting the "hanging pool commands" syndrome, and the only way to fix it is to reboot the machine, interrupting everything else that was being done on it. To live with ZFS's current way of handling this are forced to allocate an entire machine to the sole purpose of connecting USB-based pools to it, and then accessing these pools over the network from other machines, which is much slower and generally inefficient.

@Rayn0r
Copy link

Rayn0r commented Sep 24, 2018

I ran into the same problem tonight on an Ubuntu 16.04 machine.
I'm running a backup script to send/receive snapshots to single backup disks.
Because of a typo in the backup script, the command
zpool export backup01 did not run before echo 1 > /sys/block/sde/device/delete was issued.
After pulling the disk out of the tray all zpool commands started to hang.
Putting the drive back in did not cure the situation. The drive re-appeared as /dev/sdi btw.

@behlendorf behlendorf modified the milestones: 0.8.0, 0.9.0 Oct 12, 2018
@remyd1
Copy link

remyd1 commented Jan 28, 2019

I have the same problem with d state processes with no solution yet... See the "closed" issue #3667 .

@dweeezil
Copy link
Contributor

dweeezil commented Jan 28, 2019

There are unfortunately a number of other issues related to this one. As mentioned in the comments, I've been treating #5242 as the "main" issue. There are also a handful of related, but different problems in this area which have been reported.

This particular issue seems to be concentrated on the case in which a pool has become suspended via a device removal and can't be un-suspended after the device is made available again. I'll note that it definitely should be possible to continue using a pool which has been suspended due to a missing device so long as the device can be made available again. Once the device is made available again, a zpool clear should bring the pool back. However, there's currently a bug with the process: if the device doesn't come back at the same physical path in "/dev", the process doesn't work. For example, if I run a test with a USB stick at "/dev/sdc", make a pool on it, do a unplug/plug cycle, it will normally come back as "/dev/sdd". The "clear" logic tries to (re)open the device using only the path it knows about; this is a problem. In this case, the problem can be worked around by making some symlinks pointing at the new device. However, the problem, and this particular bug, can be worked around by importing the pool using stable paths (i.e. zpool import -d /dev/disk/by-id tank) because when the device re-appears, the stable path will be created again. When this happens, a zpool clear will bring the pool back to operational state. I did just test all this on current master code and it works just fine.

The other related issues such as a "lost IO request", suspension due to device removal in which the device can't be made to re-appear (due to, broken device, dodgy driver, etc.) are what I've been concentrating on (particularly the case in which an IO request is "lost" somehow) fixing. I'm pretty sure I outlined my work presented at the 2018 OpenZFS summit hackathon in one of these related issues. Quick summary on this related work: the work is somewhat in stasis, waiting for TRIM to be committed at the very least.

@behlendorf behlendorf removed this from the 0.9.0 milestone Nov 11, 2019
@marker5a
Copy link

I'd like to drag this one back out as something that'd be nice to have fixed.

In my scenario, I use a separate disk externally as a single disk pool that I use to backup my other pool. If this drive is yanked out before exporting the pool, I get a kernel panic and can no longer get access to any pools on my machine as "zpool" and "zfs" commands just freeze.

I'm unfortunately not technical at all with ZFS under the hood, but I use it a lot and this seems to be a serious bug, no?

@jilted82
Copy link

Hello,

Just wanted to share my experience I had last week which very much relates to this topic (but not the experience). I had both of my pools suspended with status insufficent replicas, restore pool from backup...
In a "mishap" I ran udevadm trigger with none of my disk aliases present in vdev_id.conf = none of my disks available from ZFS point of view. I quickly restorted my vdev_id.conf, zpool export / import and zpool clear and back in business. No reboot needed.

Running Ubuntu 18.04.4 LTS / ZFS 0.8.3.

@DurvalMenezes
Copy link

@jilted82, I think your case was different from what is being reported here: you had a "logical" problem (missing vdev_id.conf), we are talking about physical problems (ie, a physical device that ZFS was already using, physically disappearing from the system).

@haarp
Copy link

haarp commented Apr 24, 2023

#11082 is being worked on heavily.

@DurvalMenezes
Copy link

[deleted from #11082 and reposted here as per @mailinglists35 guidance]

@oshogbo @behlendorf Just wanted to let you know that you´re doing great, guys! Thanks for all the hard work you put into this!

+1, deeply appreciated. This will solve a major deficiency of ZFS for my use case, and I suspect many other people besides myself and @raimocom.

@mailinglists35
Copy link

ZFS team should [...]

No they should not explain anything.
Please learn the history of ZFS and zfsonlinux and openzfs and you will understand all we can do is being grateful and humble for receiving this for free.

If you are too lazy to read that, I'll summarize it for you: ZFS was a proprietary code by a private company (SUN), they opensourced the code, then Oracle bought SUN, closed the source, then Lawrence Livermore National Laboratory needed a solution to store huge amounts of data, they picked up the free code, they refactored it FOR THEIR OWN NEEDS to work under Linux, AND THEY MADE THEIR WORK AVAILABLE FOR FREE here on github.

If you feel entitled to explanations perhaps you might want to hire a ZFS developer and do that under the support contract.

@jerome-diver
Copy link

jerome-diver commented Apr 25, 2023

@mailinglists35 calm down baby.
And thank you for your information, i do appreciate it out of the form.
Also, i'm not a psychologist, i can not help you to fix your agressivity problem and your imagination. If i would be able to transform youself to someone lovely, i will do it for free.
Big kiss John Doe, and do not let them inject to you again something hasardous again, because it would not help.

@razum2um
Copy link

razum2um commented May 10, 2023

I discovered this wonderful thread after 5 years of using zfs for the first time. Guess how? For the first time I decided to use it portably, on a single-drive. Plugged via usb 🙂

All the time before: raidz or mirror in VDS and homelab. Seen UNAVAIL, replaced drives, resilver, all stuff. Even used it with an SBC and it survived numerous accidental power outages (and never lost data after that). Never complained.

Some could think "what if I'd backup to that drive with cool send/recv, isn't it much better than rsync?". Turned out a bad idea, it was never designed to be a portable fs for the end.

Heard some earlier disclaimers "zfs requires enterprise hardware" (which I misunderstood as "just buy a more expensive thing", but my refined take after this thread: zfs will utilize redundancy on hardware level. You say "I'm ok to lose 1-2-x disks" -> system handles exactly this requirement, not beyond, without further assumptions.

I'm really glad to see progress in #11082. Ofc, freezing on RO operations, on other pools, is slightly wtf, but maybe, meanwhile, it's worth to state it clearly on the front page (because there are a lot of issues like that, people keep asking):

if you use NO redundancy, e.g. single pluggable disk,
and if something happens with it physically, e.g. you unplug it before export,
it'll freeze ALL other zfs operations until reboot
and, yes, it's a serious warning, not like usual

I also understand why it's stale: the case when redundancy on hardware level failed beyond expectations never supposed to happen in production, and normally you have the only one pool there, and normally that server is dedicated for storage. Far from desktop conditions 🙂

p.s. this approach reminds me on lisp-y "breakpoint on exception" behaviour, like:

begin
  work
rescue Exception
  debugger
end

oshogbo added a commit to oshogbo/zfs that referenced this issue May 19, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by:  Will Andrews <will@firepipe.net>
Co-Authored-by:  Allan Jude <allan@klarasystems.com>
Sponsored-by:   Klara, Inc.
Sponsored-by:   Catalogics, Inc.
Sponsored-by:   Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by:  Will Andrews <will@firepipe.net>
Signed-off-by:  Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to oshogbo/zfs that referenced this issue May 19, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by:  Will Andrews <will@firepipe.net>
Co-Authored-by:  Allan Jude <allan@klarasystems.com>
Sponsored-by:   Klara, Inc.
Sponsored-by:   Catalogics, Inc.
Sponsored-by:   Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by:  Will Andrews <will@firepipe.net>
Signed-off-by:  Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to oshogbo/zfs that referenced this issue May 19, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by:  Will Andrews <will@firepipe.net>
Co-Authored-by:  Allan Jude <allan@klarasystems.com>
Sponsored-by:   Klara, Inc.
Sponsored-by:   Catalogics, Inc.
Sponsored-by:   Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by:  Will Andrews <will@firepipe.net>
Signed-off-by:  Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to oshogbo/zfs that referenced this issue May 19, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by:  Will Andrews <will@firepipe.net>
Co-Authored-by:  Allan Jude <allan@klarasystems.com>
Sponsored-by:   Klara, Inc.
Sponsored-by:   Catalogics, Inc.
Sponsored-by:   Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by:  Will Andrews <will@firepipe.net>
Signed-off-by:  Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to KlaraSystems/zfs that referenced this issue May 22, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by:  Will Andrews <will@firepipe.net>
Co-Authored-by:  Allan Jude <allan@klarasystems.com>
Sponsored-by:   Klara, Inc.
Sponsored-by:   Catalogics, Inc.
Sponsored-by:   Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by:  Will Andrews <will@firepipe.net>
Signed-off-by:  Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
allanjude added a commit to KlaraSystems/zfs that referenced this issue May 30, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to KlaraSystems/zfs that referenced this issue Jun 4, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to KlaraSystems/zfs that referenced this issue Jun 7, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to KlaraSystems/zfs that referenced this issue Jun 7, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo pushed a commit to oshogbo/zfs that referenced this issue Jun 11, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Sponsored-by:	Wasabi Technology, Inc.
Closes openzfs#3461
oshogbo pushed a commit to oshogbo/zfs that referenced this issue Jun 11, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Sponsored-by:	Wasabi Technology, Inc.
Closes openzfs#3461
oshogbo pushed a commit to oshogbo/zfs that referenced this issue Jun 11, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Sponsored-by:	Wasabi Technology, Inc.
Closes openzfs#3461
oshogbo pushed a commit to oshogbo/zfs that referenced this issue Jun 11, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Sponsored-by:	Wasabi Technology, Inc.
Closes openzfs#3461
oshogbo pushed a commit to oshogbo/zfs that referenced this issue Jun 11, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Signed-off-by:  Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Sponsored-by:	Wasabi Technology, Inc.
Closes openzfs#3461
@Animeshz
Copy link

Animeshz commented Jul 3, 2023

I see that its being worked on at great speeds on the #11082 with over 2k+ sloc changes, but is there a temporal fix that we can use to at least not go into reboot to fix state?

@Animeshz
Copy link

Animeshz commented Jul 3, 2023

This isn't a case of a time out. ZFS knows the disk is gone. This is a deliberate choice by the ZFS developers. If your pool can't survive due to redundancy failures it enters a suspended state to allow the administrator the ability to fix it while dirty data, etc are still in RAM. zpool set failmode=continue poolname allows some operations to fail with IO errors rather than jumping to suspended mode ASAP, but some actions will still suspend the pool.

@DeHackEd that doesn't seem to do any good, accidental disconnection reports device online instead of suspended now, and reconnection doesn't do anything, trying to touch a file or export forces over 100% load on multiple cores. Still had to reboot to fix this.

@mailinglists35
Copy link

I see that its being worked on at great speeds on the #11082 with over 2k+ sloc changes, but is there a temporal fix that we can use to at least not go into reboot to fix state?

yes. read the device mapper hack method

geoffamey pushed a commit to BlueArchive/storage-zfs-wasabi that referenced this issue Jul 5, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test openzfs#3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Sponsored-by:  Klara, Inc.
Sponsored-by:  Catalogics, Inc.
Sponsored-by:  Wasabi Technology, Inc.
Closes openzfs#3461
(cherry picked from commit 852e633772217d779a63e8c46fe3c5f81dd8960e)
oshogbo pushed a commit to KlaraSystems/zfs that referenced this issue Nov 17, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Signed-off-by:	Will Andrews <will@firepipe.net>
Signed-off-by:	Allan Jude <allan@klarasystems.com>
Sponsored-by:	Klara, Inc.
Sponsored-by:	Catalogics, Inc.
Closes openzfs#3461
oshogbo added a commit to KlaraSystems/zfs that referenced this issue Nov 17, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
oshogbo added a commit to KlaraSystems/zfs that referenced this issue Nov 27, 2023
This is primarily of use when a pool has lost its disk, while the user
doesn't care about any pending (or otherwise) transactions.

Implement various control methods to make this feasible:
- txg_wait can now take a NOSUSPEND flag, in which case the caller will
  be alerted if their txg can't be committed.  This is primarily of
  interest for callers that would normally pass TXG_WAIT, but don't want
  to wait if the pool becomes suspended, which allows unwinding in some
  cases, specifically when one is attempting a non-forced export.
  Without this, the non-forced export would preclude a forced export
  by virtue of holding the namespace lock indefinitely.
- txg_wait also returns failure for TXG_WAIT users if a pool is actually
  being force exported.  Adjust most callers to tolerate this.
- spa_config_enter_flags now takes a NOSUSPEND flag to the same effect.
- DMU objset initiator which may be set on an objset being forcibly
  exported / unmounted.
- SPA export initiator may be set on a pool being forcibly exported.
- DMU send/recv now use an interruption mechanism which relies on the
  SPA export initiator being able to enumerate datasets and closing any
  send/recv streams, causing their EINTR paths to be invoked.
- ZIO now has a cancel entry point, which tells all suspended zios to
  fail, and which suppresses the failures for non-CANFAIL users.
- metaslab, etc. cleanup, which consists of simply throwing away any
  changes that were not able to be synced out.
- Linux specific: introduce a new tunable,
  zfs_forced_export_unmount_enabled, which allows the filesystem to
  remain in a modified 'unmounted' state upon exiting zpl_umount_begin,
  to achieve parity with FreeBSD and illumos,
  which have VFS-level support for yanking filesystems out from under
  users.  However, this only helps when the user is actively performing
  I/O, while not sitting on the filesystem.  In particular, this allows
  test #3 below to pass on Linux.
- Add basic logic to zpool to indicate a force-exporting pool, instead
  of crashing due to lack of config, etc.

Add tests which cover the basic use cases:
- Force export while a send is in progress
- Force export while a recv is in progress
- Force export while POSIX I/O is in progress

This change modifies the libzfs ABI:
- New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value.
- New field libzfs_force_export for libzfs_handle.

Co-Authored-by: Will Andrews <will@firepipe.net>
Co-Authored-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Klara, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Wasabi Technology, Inc.
Closes openzfs#3461
Signed-off-by: Will Andrews <will@firepipe.net>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
@Astaoth
Copy link

Astaoth commented Dec 21, 2023

Hi,

This issue is quite a case ... I think that's the first time that I see that FS can be in this state without any kernel message (I've nothing in my dmesg and neither in my journalctl about it), because of a USB micro disconnection short enough for being unotified by the kernel (ext lvm and luks don't seem to care at all), which can put the whole storage with the same FS in RO state (I've 4 zfs pools on the same server, but only one connected by USB), which will make the FS tools to completely lock the terminal when used (zfs and zpool command never answer, can't be exited with ctrl+c, can't be put into background with ctrl+z, and even can't be killed with a kill -9) and which can be fixed only by rebooting the system even if the root mountpoint isn't impacted or with a hack.

I guess that's standard lab life : discovering a nice techno, starting to use it a little too much, for discovering the hard way its limitations 😂 . I happy to have encounter this on my personnal lab, it will a nice opportunity to learn the fix. However instead of fully blocking the terminals, having a little message with the zfs and zpoll comands would have be more usefull, but maybe this has been changed in the latest OpenZFS versions ?

Does anyone know if this issue is only about the Linux implementation or if the FreeBSD one will have the same behaviour ?

Edit : phrasing and typo

@ipaqmaster
Copy link

I've also noticed this over the years working with some backup servers which have a zpool on a dense drive over USB storage. If anything happens to interrupt that drive the zpool will suspend and there's nothing the administrator can do to fix it.

No amount of onlineing or clearing fixes it (even though the disk is right there, present) and trying to reboot the machine normally results in a hang as it can never get past the suspended pool.

I have once had to issue reisub into sysrq to "best effort" safely restart a remote machine despite this suspended USB pool issue. It worked, but having to use sysrq is nasty and all it takes is an operator not knowing about this quirk to lock up a remote machine during its reboot process.

@Codelica
Copy link

Out of complete desperation (remote machine without BMC access, etc) I tried zpool clear -nFX <poolname> per the following... and it worked. Literally everything else would just hang. Just FYI for other desperate people. ;)

@raimocom
Copy link

raimocom commented Jun 17, 2024

Out of complete desperation (remote machine without BMC access, etc) I tried zpool clear -nFX <poolname> per the following... and it worked. Literally everything else would just hang. Just FYI for other desperate people. ;)

No it doesn't work here (suspended pool because of suddenly disconnected usb-drive). It can't be that easy as pointing to a 10 year old post on superuser.com. Otherwise this whole (now 9 year old!) issue would just not exist.

zpool clear -nFX tank
dmesg | tail
WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Status: Blocked Depends on another pending change Status: Inactive Not being actively updated Status: Understood The root cause of the issue is known
Projects
None yet
Development

Successfully merging a pull request may close this issue.