RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

dechamps · 2012-08-01T08:18:48Z

Using latest zfs master, on a dual-core machine with 6 GB memory and vm.min_free_kbytes = 32768, while exporting a pool with some ZVOLs in it, zfs umount froze and I got this:

INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
Pid: 6322, comm: arc_adapt Tainted: P           O 3.2.23-zfsiscsi-std-ipv6-64 #3
Call Trace:
 <IRQ>  [<ffffffff810ebd13>] __rcu_pending+0x163/0x3e0
 [<ffffffff810ec005>] rcu_check_callbacks+0x75/0x1b0
 [<ffffffff810a35a3>] update_process_times+0x43/0x80
 [<ffffffff810c4191>] tick_sched_timer+0x61/0xb0
 [<ffffffff810b80fa>] __run_hrtimer+0x6a/0x190
 [<ffffffff810c4130>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810b8476>] hrtimer_interrupt+0xd6/0x220
 [<ffffffff810670f4>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffffa0367c90>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8d98b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffffa0367c90>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8c510>] ? _raw_spin_lock+0x10/0x30
 [<ffffffff81162ac9>] iterate_supers_type+0x89/0xd0
 [<ffffffffa0367c40>] ? zpl_inode_alloc+0x80/0x80 [zfs]
 [<ffffffffa0367c7b>] zpl_prune_sbs+0x3b/0x50 [zfs]
 [<ffffffffa02cd468>] arc_adjust_meta+0x118/0x1e0 [zfs]
 [<ffffffffa02cd530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa02cd530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0025130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa02cd592>] arc_adapt_thread+0x62/0xd0 [zfs]
 [<ffffffffa00251a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0025130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb

The rest of the system still works fine, but the message repeats itself continuously and ZFS stays blocked. I had to do a hard reset. What's surprising is that I was stress-testing the pool with 48 benchmarks running on top of 6 zvols continuously for 20 hours and it handled it just fine, it just wouldn't umount after that. I find this kinda ironic.

I only got this once, so I have no idea how to reproduce it. I hope the stack trace will be enough to pinpoint the cause.

The text was updated successfully, but these errors were encountered:

dechamps · 2012-08-01T12:34:53Z

From the stack trace, it seems to be a deadlock on sb_lock (Linux super.c). Presumably, another thread is holding the lock (probably the umount procedure doing its stuff) and is not releasing it.

Maybe something within the code path of umount is trying to allocate memory while sb_lock is held, which triggers reclaim, which wakes up arc_adapt, which calls zpl_prune_sbs, which calls iterate_supers_type, which tries to lock sb_lock, boom. I'm just guessing here.

If I come across the issue again I'll try to get the stack traces of umount-related tasks to see if there's something interesting.

behlendorf · 2012-08-01T18:10:50Z

Your probably right about it racing with the umount in some unfortunate way. That's one of the few operations which is going to touch the sb_lock. Well if you see it again grab some more debugging.

dechamps · 2012-08-02T16:30:21Z

I got it again. Same situation as in the issue description. Here's the detailed information:

INFO: rcu_sched detected stall on CPU 0 (t=15000 jiffies)
Pid: 3067, comm: arc_adapt Tainted: P           O 3.2.23-zfsiscsi-std-ipv6-64 #3
Call Trace:
 <IRQ>  [<ffffffff810ebd13>] __rcu_pending+0x163/0x3e0
 [<ffffffff810ec005>] rcu_check_callbacks+0x75/0x1b0
 [<ffffffff810a35a3>] update_process_times+0x43/0x80
 [<ffffffff810c4191>] tick_sched_timer+0x61/0xb0
 [<ffffffff810b80fa>] __run_hrtimer+0x6a/0x190
 [<ffffffff810c4130>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810b8476>] hrtimer_interrupt+0xd6/0x220
 [<ffffffff810670f4>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffffa0387c10>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8d98b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffffa0387c10>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8ba9b>] ? down_read+0xb/0x20
 [<ffffffff81162aa5>] iterate_supers_type+0x65/0xd0
 [<ffffffffa0387bc0>] ? zpl_inode_alloc+0x80/0x80 [zfs]
 [<ffffffffa0387bfb>] zpl_prune_sbs+0x3b/0x50 [zfs]
 [<ffffffffa02ec468>] arc_adjust_meta+0x118/0x1e0 [zfs]
 [<ffffffffa02ec530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa02ec530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa02ec592>] arc_adapt_thread+0x62/0xd0 [zfs]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
SysRq : Show Blocked State
  task                        PC stack   pid father
l2arc_feed      D 0000000000000000     0  3070      2 0x00000000
 ffff88018b9c1ce0 0000000000000046 ffff880100000000 ffff880194c02700
 ffff88018b9c1fd8 ffff88018b9c0010 0000000000004000 ffff88018b9c1fd8
 ffff88019506c100 ffff880194c02700 ffff88019530c5b0 0000000000000086
Call Trace:
 [<ffffffff8108f9ec>] ? try_to_wake_up+0x1cc/0x250
 [<ffffffffa0006dcb>] ? kmem_alloc_debug+0x1eb/0x340 [spl]
 [<ffffffff8108fa7d>] ? default_wake_function+0xd/0x10
 [<ffffffff81083e39>] ? __wake_up_common+0x59/0x90
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffffa000ed87>] cv_wait_common+0x97/0x190 [spl]
 [<ffffffff810b4a10>] ? wake_up_bit+0x40/0x40
 [<ffffffffa000ee9e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa0382c6b>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa02e9eed>] l2arc_feed_thread+0x60d/0x860 [zfs]
 [<ffffffffa02e98e0>] ? arc_release_bp+0x10/0x10 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
txg_sync        D 0000000000000000     0  3722      2 0x00000000
 ffff880166c8bba0 0000000000000046 ffff880166c8bb50 ffff8801862c6180
 ffff880166c8bfd8 ffff880166c8a010 0000000000004000 ffff880166c8bfd8
 ffff8801862c2d80 ffff8801862c6180 ffff880166c8bb10 ffff880166c8bc4c
Call Trace:
 [<ffffffff8108f9ec>] ? try_to_wake_up+0x1cc/0x250
 [<ffffffff8108fa7d>] ? default_wake_function+0xd/0x10
 [<ffffffff81083e39>] ? __wake_up_common+0x59/0x90
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffffa000ed87>] cv_wait_common+0x97/0x190 [spl]
 [<ffffffffa0382aa0>] ? zio_flush+0x40/0x40 [zfs]
 [<ffffffff810b4a10>] ? wake_up_bit+0x40/0x40
 [<ffffffffa000ee9e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa0382c6b>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa031ba3a>] dsl_pool_sync+0xca/0x480 [zfs]
 [<ffffffffa02ef637>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa032de07>] spa_sync+0x397/0x9a0 [zfs]
 [<ffffffffa033f081>] txg_sync_thread+0x2c1/0x490 [zfs]
 [<ffffffffa033edc0>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
umount          D 0000000000000000     0 15994  15988 0x00000000
 ffff88017fe75cf8 0000000000000082 ffff88019fc12060 ffff8801950a1a00
 ffff88017fe75fd8 ffff88017fe74010 0000000000004000 ffff88017fe75fd8
 ffff88018cddce00 ffff8801950a1a00 ffff88017fe75c68 ffffffff810881c9
Call Trace:
 [<ffffffff810881c9>] ? check_preempt_wakeup+0xf9/0x160
 [<ffffffff81084944>] ? check_preempt_curr+0x84/0xa0
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffff81b8ae9d>] schedule_timeout+0x1ad/0x280
 [<ffffffff810c93fa>] ? smp_call_function_single+0x12a/0x140
 [<ffffffff815eb954>] ? __rwsem_do_wake+0xe4/0x1c0
 [<ffffffff810ea370>] ? rcu_cpu_stall_reset+0x40/0x40
 [<ffffffff81b89df2>] wait_for_common+0xc2/0x160
 [<ffffffff8108fa70>] ? try_to_wake_up+0x250/0x250
 [<ffffffff810eb450>] ? call_rcu_sched+0x10/0x20
 [<ffffffff810eb440>] ? call_rcu_bh+0x20/0x20
 [<ffffffff81b89f38>] wait_for_completion+0x18/0x20
 [<ffffffff810eb9e4>] _rcu_barrier+0x94/0xb0
 [<ffffffff810eba17>] rcu_barrier_sched+0x17/0x20
 [<ffffffff810eba29>] rcu_barrier+0x9/0x10
 [<ffffffff81163296>] deactivate_locked_super+0x56/0x70
 [<ffffffff81163d45>] deactivate_super+0x45/0x60
 [<ffffffff8117d51a>] mntput_no_expire+0x12a/0x170
 [<ffffffff8117e269>] sys_umount+0x79/0x3a0
 [<ffffffff81b8cf7b>] system_call_fastpath+0x16/0x1b

I also kept the result of SysRq T (stack traces for all tasks) in case that's not enough.

martinhelmer · 2013-02-13T11:22:26Z

I have most likely ran into the the same issue as this one; unable to export, not even with -f.
My system is currently in that state, and if you are quick to advise, I can provide whatever diagnosis that you request.

behlendorf · 2013-02-13T20:45:20Z

@martinhelmer Thanks for the comment but I think we've got what we need. It just needs someone with time to work on it.

The iterate_supers_type() function which was introduced in the 3.0 kernel was supposed to provide a safe way to call an arbitrary function on all super blocks of a specific type. Unfortunately, because a list_head was used a bug was introduced which made it possible for iterate_supers_type() to get stuck spinning on a super block which was just deactivated. The bug was fixed in the 3.3 kernel by converting the list_head to an hlist_node. However, to resolve the issue for existing 3.0 - 3.2 kernels we detect when a list_head is used. Then to prevent the spinning from occurring the .next pointer is set to the fs_supers list_head which ensures the iterate_supers_type() function will always terminate. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#1045 Issue openzfs#861 Issue openzfs#790

The iterate_supers_type() function which was introduced in the 3.0 kernel was supposed to provide a safe way to call an arbitrary function on all super blocks of a specific type. Unfortunately, because a list_head was used a bug was introduced which made it possible for iterate_supers_type() to get stuck spinning on a super block which was just deactivated. This can occur because when the list head is removed from the fs_supers list it is reinitialized to point to itself. If the iterate_supers_type() function happened to be processing the removed list_head it will get stuck spinning on that list_head. The bug was fixed in the 3.3 kernel by converting the list_head to an hlist_node. However, to resolve the issue for existing 3.0 - 3.2 kernels we detect when a list_head is used. Then to prevent the spinning from occurring the .next pointer is set to the fs_supers list_head which ensures the iterate_supers_type() function will always terminate. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#1045 Closes openzfs#861 Closes openzfs#790

behlendorf mentioned this issue Aug 2, 2012

iterate_supers_type() spinning #790

Closed

dechamps mentioned this issue Oct 15, 2012

EBUSY upon ZPOOL_EXPORT #1045

Closed

behlendorf mentioned this issue Jul 16, 2013

Fix arc_adapt() spinning in iterate_supers_type() #1595

Closed

behlendorf closed this as completed in dba1d70 Jul 17, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

dechamps commented Aug 1, 2012

dechamps commented Aug 1, 2012

behlendorf commented Aug 1, 2012

dechamps commented Aug 2, 2012

martinhelmer commented Feb 13, 2013

behlendorf commented Feb 13, 2013

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

Comments

dechamps commented Aug 1, 2012

dechamps commented Aug 1, 2012

behlendorf commented Aug 1, 2012

dechamps commented Aug 2, 2012

martinhelmer commented Feb 13, 2013

behlendorf commented Feb 13, 2013