Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some zpool commands to timeout #13427

Open
h1z1 opened this issue May 6, 2022 · 4 comments
Open

Allow some zpool commands to timeout #13427

h1z1 opened this issue May 6, 2022 · 4 comments
Labels
Type: Feature Feature request or new feature

Comments

@h1z1
Copy link

h1z1 commented May 6, 2022

Describe the feature would like to see added to OpenZFS

It can be extremely helpful to obtain troubleshooting information from a system crashing or on the verge of it. Problem is some tasks will hang when they could timeout.

How will this feature improve OpenZFS?

By allowing a command like zpool events to timeout (with an appropriate error), it can be the difference between a complete system failure and recovery. I'd imagine there are other similar commands like zpool status, iostats, etc.

In zpool history's case there's even a comment in the code noting history is async thus the need for a txg sync:

	/*
	 * The history is logged asynchronously, so when they request
	 * the first chunk of history, make sure everything has been
	 * synced to disk so that we get it.
	 */
	if (*offp == 0 && spa_writeable(spa))
		txg_wait_synced(spa_get_dsl(spa), 0);

That logic seems backward. The log should be atomic on creation / submission, async on read no? Otherwise how is consistancy kept if the system is not able to do sync? spa_history_lock is held on read.

Additional context

I expect there is no one fix for all but in the case of zpool history, when the pool is unable to sync due to other issues, the stack becomes:

[<0>] cv_wait_common+0xb2/0x140 [spl]
[<0>] __cv_wait_io+0x18/0x20 [spl]
[<0>] txg_wait_synced_impl+0xdb/0x130 [zfs]
[<0>] txg_wait_synced+0x10/0x40 [zfs]
[<0>] spa_history_get+0x29a/0x2e0 [zfs]
[<0>] zfs_ioc_pool_get_history+0xfe/0x150 [zfs]
[<0>] zfsdev_ioctl_common+0x7db/0x840 [zfs]
[<0>] zfsdev_ioctl+0x56/0xe0 [zfs]
[<0>] do_vfs_ioctl+0xaa/0x620
[<0>] ksys_ioctl+0x67/0x90
[<0>] __x64_sys_ioctl+0x1a/0x20
[<0>] do_syscall_64+0x60/0x1c0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

It will never return due to the failure.

@h1z1 h1z1 added the Type: Feature Feature request or new feature label May 6, 2022
@rincebrain
Copy link
Contributor

I believe this is trickier than one might imagine - see #11082 for how invasive the changes needed may be to discard things in flight.

@h1z1
Copy link
Author

h1z1 commented May 7, 2022

Indeed it won't be a simple switch from everything being sync to async but there is hope. LUA for example #8904. Events and history seem like rather simple cases, other things like zpool add/replace/etc are of course going to need more thought. Maybe the low hanging fruit is to make zfs/zpool commands themselves timeout aware?

The major issue that you can't kill them from the shell so it hangs a lot of things. In the case of multiple pools in one system it can completely kill unrelated ones.

Related is the use of spinlocks period. A side effect of the failure related above was the CPU spins to the point the kernel thinks it's stuck (NMI watchdog isn't enabled).

@rincebrain
Copy link
Contributor

rincebrain commented May 7, 2022 via email

@h1z1
Copy link
Author

h1z1 commented May 7, 2022

Bit misuse of words on my part sorry, I meant they have a place of course but could be tweaked. Would it not make more sense in the case above for example to either return an error or avoid the txg sync entirely? There's still an underlining issue with how the pool got into that state but it would allow some further investigation. S'pose another option is expose them in procfs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

2 participants