EBUSY upon ZPOOL_EXPORT #1045

stephane-chazelas · 2012-10-15T09:49:21Z

I got a second occurrence of the issue described at http://thread.gmane.org/gmane.linux.file-systems.zfs.user/4661

I've been doing an "offsite backup" every week, whereby I zfs-send|zfs-recv a number of datasets from one zpool onto another zpool on a pair of hard drives (well luks devices on top of hard drives). I do a zfs export, luksClose before taking the drives offsite.

Today, for some reason, the zfs export fails with:

zpool export offsite-backup-05
cannot export 'offsite-backup-05': pool is busy

There is no zfs command running, nothing mounted (zpool export managed to do that part) on there (checked /proc/mounts as well), nothing uses the zvols in there, no loop device or anything. I've tried to killall -STOP udevd in case it was somehow accessing stuff while the export was trying to tidy them away.

I've got a sysrq-t output, not sure what to look for to see what may be holding it.

Trying to "zfs mount -a" to see if I can mount it back, it says for every mount point:

filesystem 'offsite-backup-05/main/servers/skywalker/shadow_nbd/c' is already mounted cannot mount 'offsite-backup-05/main/servers/skywalker/shadow_nbd/c': Resource temporarily unavailable

While "grep offsite-backup-05 /proc/mounts" returns nothing.

So there's something definitely going wrong there.

I can still read the zvols on there, though.

I have the zevents going to the console (zfs_zevent_console=1) and there has been nothing (no IO error, no nothing, I used to get a lot of oops, but since upgrading the memory to 48GB, it has been quite stable until now).

Before rebooting, I also tried to export the other zpool (the one I was "zfs send"ing from) and got the same EBUSY error (succesful umount but EBUSY upon the ioctl(ZPOOL_EXPORT) as for the other one).

I noticed (in top) an arc_adapt taking 100% of 1 CPU. Running a sysrq-l a few times showed each time it being in:

Pid: 477, comm: arc_adapt Tainted: P           O 3.2.0-29-generic #46-Ubuntu Dell Inc. PowerEdge R515/03X0MN
RIP: 0010:[<ffffffff81179f4d>]  [<ffffffff81179f4d>] __put_super+0x6d/0x80
RSP: 0018:ffff8806470b5dc0  EFLAGS: 00000202
RAX: 0000000000000001 RBX: ffff880ab13b9c00 RCX: 0000000000000001
RDX: 000000000000bec5 RSI: ffff880653a41700 RDI: ffff880ab13b9c00
RBP: ffff8806470b5dd0 R08: ffff8806470b4000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff880ab13b9c00
R13: ffffffffa0214850 R14: ffffffff81f03c20 R15: ffffffffa01e7f20
FS:  00007f235cb4b700(0000) GS:ffff880c7f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000207f6d0 CR3: 0000000001c05000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process arc_adapt (pid: 477, threadinfo ffff8806470b4000, task ffff880653a41700)
Stack:
 ffff8806470b5dd0 ffff880ab13b9c00 ffff8806470b5e20 ffffffff8117a0d7
 ffff8806470b5e38 ffff880ab13b9c68 ffffffffa02197e0 ffff8806470a5740
 0000000000000000 ffff8806470a5760 ffffffffa01e7f50 ffffffffffffffff
Call Trace:
 [<ffffffff8117a0d7>] iterate_supers_type+0xa7/0xe0
 [<ffffffffa01e7f50>] ? zpl_prune_sb+0x30/0x30 [zfs]
 [<ffffffffa01e7f8f>] zpl_prune_sbs+0x3f/0x50 [zfs]
 [<ffffffffa01489b1>] arc_adjust_meta+0x121/0x1e0 [zfs]
 [<ffffffffa0148a70>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0148a70>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0148ada>] arc_adapt_thread+0x6a/0xd0 [zfs]
 [<ffffffffa00830b8>] thread_generic_wrapper+0x78/0x90 [spl]
 [<ffffffffa0083040>] ? __thread_create+0x310/0x310 [spl]
 [<ffffffff81089fbc>] kthread+0x8c/0xa0
 [<ffffffff81664034>] kernel_thread_helper+0x4/0x10
 [<ffffffff81089f30>] ? flush_kthread_worker+0xa0/0xa0
 [<ffffffff81664030>] ? gs_change+0x13/0x13

In case that talks to anybody.

The text was updated successfully, but these errors were encountered:

dechamps · 2012-10-15T10:36:53Z

Your call trace matches #861.

behlendorf · 2012-10-15T15:52:01Z

Right this looks like a duplicate of #861.

stephane-chazelas · 2012-10-15T21:34:16Z

Well, it is different in that I don't get any "rcu_sched detected stall", the umount returns fine, the export doesn't hang but returns with EBUSY, but indeed they look similar (and to #790).

Any recommendation on what I should try and do the next time it happens?

behlendorf · 2012-10-15T22:21:24Z

Once it happens there's nothing really which can be done. We needs to happen is for us to identify the exact flaw and see if/how it can be worked around and then properly fixed.

The iterate_supers_type() function which was introduced in the 3.0 kernel was supposed to provide a safe way to call an arbitrary function on all super blocks of a specific type. Unfortunately, because a list_head was used a bug was introduced which made it possible for iterate_supers_type() to get stuck spinning on a super block which was just deactivated. The bug was fixed in the 3.3 kernel by converting the list_head to an hlist_node. However, to resolve the issue for existing 3.0 - 3.2 kernels we detect when a list_head is used. Then to prevent the spinning from occurring the .next pointer is set to the fs_supers list_head which ensures the iterate_supers_type() function will always terminate. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1045 Issue openzfs#861 Issue openzfs#790

The iterate_supers_type() function which was introduced in the 3.0 kernel was supposed to provide a safe way to call an arbitrary function on all super blocks of a specific type. Unfortunately, because a list_head was used a bug was introduced which made it possible for iterate_supers_type() to get stuck spinning on a super block which was just deactivated. This can occur because when the list head is removed from the fs_supers list it is reinitialized to point to itself. If the iterate_supers_type() function happened to be processing the removed list_head it will get stuck spinning on that list_head. The bug was fixed in the 3.3 kernel by converting the list_head to an hlist_node. However, to resolve the issue for existing 3.0 - 3.2 kernels we detect when a list_head is used. Then to prevent the spinning from occurring the .next pointer is set to the fs_supers list_head which ensures the iterate_supers_type() function will always terminate. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1045 Closes openzfs#861 Closes openzfs#790

…enzfs#1045) --- updated-dependencies: - dependency-name: unicode-ident dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

behlendorf mentioned this issue Jul 16, 2013

Fix arc_adapt() spinning in iterate_supers_type() #1595

Closed

behlendorf closed this as completed in dba1d70 Jul 17, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBUSY upon ZPOOL_EXPORT #1045

EBUSY upon ZPOOL_EXPORT #1045

stephane-chazelas commented Oct 15, 2012

dechamps commented Oct 15, 2012

behlendorf commented Oct 15, 2012

stephane-chazelas commented Oct 15, 2012

behlendorf commented Oct 15, 2012

EBUSY upon ZPOOL_EXPORT #1045

EBUSY upon ZPOOL_EXPORT #1045

Comments

stephane-chazelas commented Oct 15, 2012

dechamps commented Oct 15, 2012

behlendorf commented Oct 15, 2012

stephane-chazelas commented Oct 15, 2012

behlendorf commented Oct 15, 2012