-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use New BIO_RW_FAILFAST_* API #3
Comments
I have just observed this on an Ubuntu 10.04 system running a git checkout (commit ID: 7dfe24e). My situation is much nastier, actually. I was testing making zvols into an MD RAID-6, using an ancient disk array connected via the aic7xxx driver in-kernel. I was doing some writes, and it was working nicely, then I pulled a disk to see what would happen. Well, it's now about 10-15 minutes later, and i've seen no useful activity in ZFS since then. The pool the disk was in reported a lot of write errors but hasn't marked the disk/pool as offline, and consequently MD is blocking forever. Even re-inserting the disk did not prompt any useful behavior. |
Fixed as much as possible without patching the SCSI the layer or individual drivers. As of 0.5.2 using linux 2.6.32 and the mpt2sas driver drive failures are now observed in roughly 2-3 minutes. This is acceptable for now, and it's worth pointing out that this same basic problem exists on OpenSolaris systems. |
Calls to arc_read with NULL aflags will cause panic
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle.
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Co-Authored-by: Will Andrews <will@firepipe.net> Co-Authored-by: Allan Jude <allan@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461
This is primarily of use when a pool has lost its disk, while the user doesn't care about any pending (or otherwise) transactions. Implement various control methods to make this feasible: - txg_wait can now take a NOSUSPEND flag, in which case the caller will be alerted if their txg can't be committed. This is primarily of interest for callers that would normally pass TXG_WAIT, but don't want to wait if the pool becomes suspended, which allows unwinding in some cases, specifically when one is attempting a non-forced export. Without this, the non-forced export would preclude a forced export by virtue of holding the namespace lock indefinitely. - txg_wait also returns failure for TXG_WAIT users if a pool is actually being force exported. Adjust most callers to tolerate this. - spa_config_enter_flags now takes a NOSUSPEND flag to the same effect. - DMU objset initiator which may be set on an objset being forcibly exported / unmounted. - SPA export initiator may be set on a pool being forcibly exported. - DMU send/recv now use an interruption mechanism which relies on the SPA export initiator being able to enumerate datasets and closing any send/recv streams, causing their EINTR paths to be invoked. - ZIO now has a cancel entry point, which tells all suspended zios to fail, and which suppresses the failures for non-CANFAIL users. - metaslab, etc. cleanup, which consists of simply throwing away any changes that were not able to be synced out. - Linux specific: introduce a new tunable, zfs_forced_export_unmount_enabled, which allows the filesystem to remain in a modified 'unmounted' state upon exiting zpl_umount_begin, to achieve parity with FreeBSD and illumos, which have VFS-level support for yanking filesystems out from under users. However, this only helps when the user is actively performing I/O, while not sitting on the filesystem. In particular, this allows test openzfs#3 below to pass on Linux. - Add basic logic to zpool to indicate a force-exporting pool, instead of crashing due to lack of config, etc. Add tests which cover the basic use cases: - Force export while a send is in progress - Force export while a recv is in progress - Force export while POSIX I/O is in progress This change modifies the libzfs ABI: - New ZPOOL_STATUS_FORCE_EXPORTING zpool_status_t enum value. - New field libzfs_force_export for libzfs_handle. Signed-off-by: Will Andrews <will@firepipe.net> Signed-off-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Sponsored-by: Klara, Inc. Sponsored-by: Catalogics, Inc. Sponsored-by: Wasabi Technology, Inc. Closes openzfs#3461 (cherry picked from commit 852e633772217d779a63e8c46fe3c5f81dd8960e)
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 openzfs#1 0xffffffff8058e86f at vpanic+0x17f openzfs#2 0xffffffff8058e6e3 at panic+0x43 openzfs#3 0xffffffff808adc15 at trap_fatal+0x385 openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f openzfs#5 0xffffffff80886da8 at calltrap+0x8 openzfs#6 0xffffffff80669186 at vgonel+0x186 openzfs#7 0xffffffff80669841 at vgone+0x31 openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#11 0xffffffff8065a28c at lookup+0x45c openzfs#12 0xffffffff806594b9 at namei+0x259 openzfs#13 0xffffffff80676a33 at kern_statat+0xf3 openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 openzfs#1 0xffffffff8059620f at vpanic+0x17f openzfs#2 0xffffffff81a27f4a at spl_panic+0x3a openzfs#3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 openzfs#4 0xffffffff8066fdee at vinactivef+0xde openzfs#5 0xffffffff80670b8a at vgonel+0x1ea openzfs#6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501
As of zfs-0.4.5 we no longer unconditionally use the BIO_RW_FAILFAST flag. In newer kernels BIO_RW_FAILFAST was replaced with IO_RW_FAILFAST_{DEV|_TRANSPORT|_DRIVER}. The API change is a step in the right direction but the vdev disk code needs to be updated to take advantage of the new API.
For now if the legacy BIO_RW_FAILFAST flag is detected at configure time we use it. If it is missing it means we are running against a kernel with the newer API. With the new API we should be able to ensure some fairly smart behavior in the face of IO errors, but until then we are going to have a crazy number of unless retries at the lower layers.
Last week I observed a real disk failure while zfs-0.4.7 was under a kpios write load in RHEL5.4. While the software did handle the failure the BIO_RW_FAILFAST support does not appear to have worked correctly. The single drive failure hung the committing transaction group for at least 360 seconds while the low level mptsas driver retried the IO. Additionally, from what I can tell it does not appear that the IO failure was properly reported back to ZFS either. What we need to have happen is for the IO to fail immediately with minimal retries at the driver and scsi layer.
The text was updated successfully, but these errors were encountered: