-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool command hangs when attempting to create a new pool on a zvol #612
Comments
Yes, creating a new zfs pool on top of a zvol isn't handled. I've yet to look in to exactly why it fails but I can imagine a ton of potential deadlocks. I should probably merge a patch to detect when something is trying to do this and flat out prohibit it. |
In a similar vein, I suppose I could do |
As richard mentioned, a vm-hosting environment, where vm's storage is backed by zvols, and the vm uses zfs as well (e.g. solaris-derivatives) is a valid (albeit niche) use cache for this. The admin might want to create the pool on the host, populate the content (zfs send/receive, rsync, whatever), export the pool, then startup the guest whose root is on that pool. The copy process will be faster when done directly on the host instead of (for example) booting with livecd on the guest and transfer the files using ssh. Anyway, as a workaround I managed to do that using virtualbox + rawdisk vmdk mapping + scst_local. Still faster compared to using ssh. |
Has anyone tried this recently? The -rc11 release addresses many of the deadlock which originally concerned me. No promises it will work now, but it might. Has anyone tried this on OI or FreeBSD? |
Still the same result using Ubuntu's daily ppa, v0.6.0.81-rc11 |
So my basic understanding of the deadlock at the moment, is that one thread is doing this stack:
and the other gets stuck here;
Looking at
So, in the case that we are holding the mutex, and it is considered our "first open" of the device. Or perhaps that should be called the number of references go from 0 to 1. It is worth noting that If I stop
Also works. Possible attacks are then,
What does |
Actually, according to your stacks this is caused by a lock inversion between the
|
I was actually going to ask you that, as I am not entirely sure where it is from. But now that we both want to know, I shall focus on that. |
I have opened a pull request #1157 with a patch to fix this issue. It makes vdev_uses_zvols() unconditionally return B_TRUE, which is necessary until someone figures out how to make vdev_uses_zvols() work on Linux. Making vdev_uses_zvols() work is not a straightforward thing when we cases such as trying to make a pool backed by a file on a filesystem on a zvol (or on a dataset). This needs more testing, but this patch makes the simple case of making a pool on a zvol work. |
From irc, I get a slightly different deadlock:
|
@lundman Do you have any idea what is invoking blkid on your system? it appears to suffer from the lock inversion issue that @behlendorf highlighted. |
Yes, as we discussed on IRC, udev is issuing
and to confirm that, I send SIGSTOP to udev, create my pool, then SIGCONT without issues. |
Ignore my comment about the pull request. There is a separate race here and @lundman's earlier patch resolves it in the common case (where you are not using something inbetween like dm-crypt). |
Here is output of udevadm:
Those two events occur when I do |
If we care where the udev events come from, they occur from before the ioctl() to create the pool, more specifically from;
and
and
I can easily put a "sleep(1)" here to solve my issues, but that is obviously not the real answer. |
@lundman, @ryao Yeah, this is a tricky one. The usual thing to do in a case like this is to ensure the locks are always taken in the same order. Unfortunately the bdev->bd_mutex is taken unconditionally by the kernel during open(2) so that's outside our control. And the open really does need to take the spa_namespace_lock to safely lookup the spa. As for the create it broadly holds the spa_namespace_lock in order to safely lookup and then add the spa. This lock inversion deadlock extends beyond simply creasing a zpool on a zvol. It's entirely possible (although very very unlikely) it could occur for any zpool ioctl() operation which is forced to reopen a vdev while holding the spa_namespace_lock and racing with an unrelated open of the zvol. Although based on how often that happens in the wild it's probably not something to lose any sleep over. |
@lundman @ryao Try behlendorf/zfs@45eed13 it should fix the issue by detecting the lock inversion if it occurs, unwinding the locks, and they reacquiring them safely. |
I have tested this patch, and I can no longer deadlock. Creating the pool, reboot and importing, all work as expected. Thanks. |
In all but one case the spa_namespace_lock is taken before the bdev->bd_mutex lock. But Linux __blkdev_get() function calls fops->open() with the bdev->bd_mutex lock held and we must somehow still safely acquire the spa_namespace_lock. To avoid a potential lock inversion deadlock we preemptively try to take the spa_namespace_lock(). Normally it will not be contended and this is safe because spa_open_common() handles the case where the caller already holds the spa_namespace_lock. When it is contended we risk a lock inversion if we were to block waiting for the lock. Luckily, the __blkdev_get() function allows us to return -ERESTARTSYS which will result in bdev->bd_mutex being dropped, reacquired, and fops->open() being called again. This process can be repeated safely until both locks are acquired. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jorgen Lundman <lundman@lundman.net> Closes openzfs#612
I ran
zpool create -o ashift=13 tank /dev/zvol/rpool/KVM/gfreebsd
to try to create a pool for a VM with 0.6.0-rc7, but the command froze. I assume that this is related to issue #342 and possibly also issue #469.The text was updated successfully, but these errors were encountered: