Allow some zpool commands to timeout #13427

h1z1 · 2022-05-06T08:31:08Z

Describe the feature would like to see added to OpenZFS

It can be extremely helpful to obtain troubleshooting information from a system crashing or on the verge of it. Problem is some tasks will hang when they could timeout.

How will this feature improve OpenZFS?

By allowing a command like zpool events to timeout (with an appropriate error), it can be the difference between a complete system failure and recovery. I'd imagine there are other similar commands like zpool status, iostats, etc.

In zpool history's case there's even a comment in the code noting history is async thus the need for a txg sync:

	/*
	 * The history is logged asynchronously, so when they request
	 * the first chunk of history, make sure everything has been
	 * synced to disk so that we get it.
	 */
	if (*offp == 0 && spa_writeable(spa))
		txg_wait_synced(spa_get_dsl(spa), 0);

That logic seems backward. The log should be atomic on creation / submission, async on read no? Otherwise how is consistancy kept if the system is not able to do sync? spa_history_lock is held on read.

Additional context

I expect there is no one fix for all but in the case of zpool history, when the pool is unable to sync due to other issues, the stack becomes:

[<0>] cv_wait_common+0xb2/0x140 [spl]
[<0>] __cv_wait_io+0x18/0x20 [spl]
[<0>] txg_wait_synced_impl+0xdb/0x130 [zfs]
[<0>] txg_wait_synced+0x10/0x40 [zfs]
[<0>] spa_history_get+0x29a/0x2e0 [zfs]
[<0>] zfs_ioc_pool_get_history+0xfe/0x150 [zfs]
[<0>] zfsdev_ioctl_common+0x7db/0x840 [zfs]
[<0>] zfsdev_ioctl+0x56/0xe0 [zfs]
[<0>] do_vfs_ioctl+0xaa/0x620
[<0>] ksys_ioctl+0x67/0x90
[<0>] __x64_sys_ioctl+0x1a/0x20
[<0>] do_syscall_64+0x60/0x1c0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

It will never return due to the failure.

The text was updated successfully, but these errors were encountered:

rincebrain · 2022-05-06T16:26:57Z

I believe this is trickier than one might imagine - see #11082 for how invasive the changes needed may be to discard things in flight.

h1z1 · 2022-05-07T03:32:08Z

Indeed it won't be a simple switch from everything being sync to async but there is hope. LUA for example #8904. Events and history seem like rather simple cases, other things like zpool add/replace/etc are of course going to need more thought. Maybe the low hanging fruit is to make zfs/zpool commands themselves timeout aware?

The major issue that you can't kill them from the shell so it hangs a lot of things. In the case of multiple pools in one system it can completely kill unrelated ones.

Related is the use of spinlocks period. A side effect of the failure related above was the CPU spins to the point the kernel thinks it's stuck (NMI watchdog isn't enabled).

rincebrain · 2022-05-07T05:36:50Z

The source for mutex.h has a comment about how Linux mutexes don't promise serialization in some edge cases, but the semantics elsewhere in the OpenZFS codebase assume the mutex type does, so they need to implement certain codepaths with a spinlock specifically to provide the additional guarantees. If you can find a better solution, I'm sure it'd be welcome.

…

On Fri, May 6, 2022 at 11:32 PM h1z1 ***@***.***> wrote: Indeed it won't be a simple switch from everything being sync to async but there is hope. LUA for example #8904 <#8904>. Events and history seem like rather simple cases, other things like zpool add/replace/etc are of course going to need more thought. Maybe the low hanging fruit is to make zfs/zpool commands themselves timeout aware? The major issue that you can't kill them from the shell so it hangs a lot of things. In the case of multiple pools in one system it can completely kill unrelated ones. Related is the use of spinlocks period. A side effect of the failure related above was the CPU spins to the point the kernel thinks it's stuck (NMI watchdog isn't enabled). — Reply to this email directly, view it on GitHub <#13427 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABUI7MFFYW46RIPF2VLCQDVIXP4FANCNFSM5VHN2JNA> . You are receiving this because you commented.Message ID: ***@***.***>

h1z1 · 2022-05-07T10:08:26Z

Bit misuse of words on my part sorry, I meant they have a place of course but could be tweaked. Would it not make more sense in the case above for example to either return an error or avoid the txg sync entirely? There's still an underlining issue with how the pool got into that state but it would allow some further investigation. S'pose another option is expose them in procfs?

h1z1 added the Type: Feature Feature request or new feature label May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow some zpool commands to timeout #13427

Allow some zpool commands to timeout #13427

h1z1 commented May 6, 2022

rincebrain commented May 6, 2022

h1z1 commented May 7, 2022

rincebrain commented May 7, 2022 via email

h1z1 commented May 7, 2022

Allow some zpool commands to timeout #13427

Allow some zpool commands to timeout #13427

Comments

h1z1 commented May 6, 2022

Describe the feature would like to see added to OpenZFS

How will this feature improve OpenZFS?

Additional context

rincebrain commented May 6, 2022

h1z1 commented May 7, 2022

rincebrain commented May 7, 2022 via email

h1z1 commented May 7, 2022