Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

Closed
dechamps opened this issue Aug 1, 2012 · 5 comments
Closed

RCU stall in arc_adapt (zpl_prune_sbs) during umount #861

dechamps opened this issue Aug 1, 2012 · 5 comments
Milestone

Comments

@dechamps
Copy link
Contributor

dechamps commented Aug 1, 2012

Using latest zfs master, on a dual-core machine with 6 GB memory and vm.min_free_kbytes = 32768, while exporting a pool with some ZVOLs in it, zfs umount froze and I got this:

INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
Pid: 6322, comm: arc_adapt Tainted: P           O 3.2.23-zfsiscsi-std-ipv6-64 #3
Call Trace:
 <IRQ>  [<ffffffff810ebd13>] __rcu_pending+0x163/0x3e0
 [<ffffffff810ec005>] rcu_check_callbacks+0x75/0x1b0
 [<ffffffff810a35a3>] update_process_times+0x43/0x80
 [<ffffffff810c4191>] tick_sched_timer+0x61/0xb0
 [<ffffffff810b80fa>] __run_hrtimer+0x6a/0x190
 [<ffffffff810c4130>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810b8476>] hrtimer_interrupt+0xd6/0x220
 [<ffffffff810670f4>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffffa0367c90>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8d98b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffffa0367c90>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8c510>] ? _raw_spin_lock+0x10/0x30
 [<ffffffff81162ac9>] iterate_supers_type+0x89/0xd0
 [<ffffffffa0367c40>] ? zpl_inode_alloc+0x80/0x80 [zfs]
 [<ffffffffa0367c7b>] zpl_prune_sbs+0x3b/0x50 [zfs]
 [<ffffffffa02cd468>] arc_adjust_meta+0x118/0x1e0 [zfs]
 [<ffffffffa02cd530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa02cd530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0025130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa02cd592>] arc_adapt_thread+0x62/0xd0 [zfs]
 [<ffffffffa00251a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0025130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb

The rest of the system still works fine, but the message repeats itself continuously and ZFS stays blocked. I had to do a hard reset. What's surprising is that I was stress-testing the pool with 48 benchmarks running on top of 6 zvols continuously for 20 hours and it handled it just fine, it just wouldn't umount after that. I find this kinda ironic.

I only got this once, so I have no idea how to reproduce it. I hope the stack trace will be enough to pinpoint the cause.

@dechamps
Copy link
Contributor Author

dechamps commented Aug 1, 2012

From the stack trace, it seems to be a deadlock on sb_lock (Linux super.c). Presumably, another thread is holding the lock (probably the umount procedure doing its stuff) and is not releasing it.

Maybe something within the code path of umount is trying to allocate memory while sb_lock is held, which triggers reclaim, which wakes up arc_adapt, which calls zpl_prune_sbs, which calls iterate_supers_type, which tries to lock sb_lock, boom. I'm just guessing here.

If I come across the issue again I'll try to get the stack traces of umount-related tasks to see if there's something interesting.

@behlendorf
Copy link
Contributor

Your probably right about it racing with the umount in some unfortunate way. That's one of the few operations which is going to touch the sb_lock. Well if you see it again grab some more debugging.

@dechamps
Copy link
Contributor Author

dechamps commented Aug 2, 2012

I got it again. Same situation as in the issue description. Here's the detailed information:

INFO: rcu_sched detected stall on CPU 0 (t=15000 jiffies)
Pid: 3067, comm: arc_adapt Tainted: P           O 3.2.23-zfsiscsi-std-ipv6-64 #3
Call Trace:
 <IRQ>  [<ffffffff810ebd13>] __rcu_pending+0x163/0x3e0
 [<ffffffff810ec005>] rcu_check_callbacks+0x75/0x1b0
 [<ffffffff810a35a3>] update_process_times+0x43/0x80
 [<ffffffff810c4191>] tick_sched_timer+0x61/0xb0
 [<ffffffff810b80fa>] __run_hrtimer+0x6a/0x190
 [<ffffffff810c4130>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810b8476>] hrtimer_interrupt+0xd6/0x220
 [<ffffffff810670f4>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffffa0387c10>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8d98b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffffa0387c10>] ? zpl_prune_sbs+0x50/0x50 [zfs]
 [<ffffffff81b8ba9b>] ? down_read+0xb/0x20
 [<ffffffff81162aa5>] iterate_supers_type+0x65/0xd0
 [<ffffffffa0387bc0>] ? zpl_inode_alloc+0x80/0x80 [zfs]
 [<ffffffffa0387bfb>] zpl_prune_sbs+0x3b/0x50 [zfs]
 [<ffffffffa02ec468>] arc_adjust_meta+0x118/0x1e0 [zfs]
 [<ffffffffa02ec530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa02ec530>] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa02ec592>] arc_adapt_thread+0x62/0xd0 [zfs]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
SysRq : Show Blocked State
  task                        PC stack   pid father
l2arc_feed      D 0000000000000000     0  3070      2 0x00000000
 ffff88018b9c1ce0 0000000000000046 ffff880100000000 ffff880194c02700
 ffff88018b9c1fd8 ffff88018b9c0010 0000000000004000 ffff88018b9c1fd8
 ffff88019506c100 ffff880194c02700 ffff88019530c5b0 0000000000000086
Call Trace:
 [<ffffffff8108f9ec>] ? try_to_wake_up+0x1cc/0x250
 [<ffffffffa0006dcb>] ? kmem_alloc_debug+0x1eb/0x340 [spl]
 [<ffffffff8108fa7d>] ? default_wake_function+0xd/0x10
 [<ffffffff81083e39>] ? __wake_up_common+0x59/0x90
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffffa000ed87>] cv_wait_common+0x97/0x190 [spl]
 [<ffffffff810b4a10>] ? wake_up_bit+0x40/0x40
 [<ffffffffa000ee9e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa0382c6b>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa02e9eed>] l2arc_feed_thread+0x60d/0x860 [zfs]
 [<ffffffffa02e98e0>] ? arc_release_bp+0x10/0x10 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
txg_sync        D 0000000000000000     0  3722      2 0x00000000
 ffff880166c8bba0 0000000000000046 ffff880166c8bb50 ffff8801862c6180
 ffff880166c8bfd8 ffff880166c8a010 0000000000004000 ffff880166c8bfd8
 ffff8801862c2d80 ffff8801862c6180 ffff880166c8bb10 ffff880166c8bc4c
Call Trace:
 [<ffffffff8108f9ec>] ? try_to_wake_up+0x1cc/0x250
 [<ffffffff8108fa7d>] ? default_wake_function+0xd/0x10
 [<ffffffff81083e39>] ? __wake_up_common+0x59/0x90
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffffa000ed87>] cv_wait_common+0x97/0x190 [spl]
 [<ffffffffa0382aa0>] ? zio_flush+0x40/0x40 [zfs]
 [<ffffffff810b4a10>] ? wake_up_bit+0x40/0x40
 [<ffffffffa000ee9e>] __cv_wait+0xe/0x10 [spl]
 [<ffffffffa0382c6b>] zio_wait+0xdb/0x150 [zfs]
 [<ffffffffa031ba3a>] dsl_pool_sync+0xca/0x480 [zfs]
 [<ffffffffa02ef637>] ? bpobj_space+0x97/0xa0 [zfs]
 [<ffffffffa032de07>] spa_sync+0x397/0x9a0 [zfs]
 [<ffffffffa033f081>] txg_sync_thread+0x2c1/0x490 [zfs]
 [<ffffffffa033edc0>] ? txg_thread_exit+0x40/0x40 [zfs]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffffa00081a3>] thread_generic_wrapper+0x73/0x90 [spl]
 [<ffffffffa0008130>] ? __thread_create+0x350/0x350 [spl]
 [<ffffffff810b44f6>] kthread+0x96/0xa0
 [<ffffffff81b8ed74>] kernel_thread_helper+0x4/0x10
 [<ffffffff810b4460>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff81b8ed70>] ? gs_change+0xb/0xb
umount          D 0000000000000000     0 15994  15988 0x00000000
 ffff88017fe75cf8 0000000000000082 ffff88019fc12060 ffff8801950a1a00
 ffff88017fe75fd8 ffff88017fe74010 0000000000004000 ffff88017fe75fd8
 ffff88018cddce00 ffff8801950a1a00 ffff88017fe75c68 ffffffff810881c9
Call Trace:
 [<ffffffff810881c9>] ? check_preempt_wakeup+0xf9/0x160
 [<ffffffff81084944>] ? check_preempt_curr+0x84/0xa0
 [<ffffffff81b8aa9a>] schedule+0x3a/0x60
 [<ffffffff81b8ae9d>] schedule_timeout+0x1ad/0x280
 [<ffffffff810c93fa>] ? smp_call_function_single+0x12a/0x140
 [<ffffffff815eb954>] ? __rwsem_do_wake+0xe4/0x1c0
 [<ffffffff810ea370>] ? rcu_cpu_stall_reset+0x40/0x40
 [<ffffffff81b89df2>] wait_for_common+0xc2/0x160
 [<ffffffff8108fa70>] ? try_to_wake_up+0x250/0x250
 [<ffffffff810eb450>] ? call_rcu_sched+0x10/0x20
 [<ffffffff810eb440>] ? call_rcu_bh+0x20/0x20
 [<ffffffff81b89f38>] wait_for_completion+0x18/0x20
 [<ffffffff810eb9e4>] _rcu_barrier+0x94/0xb0
 [<ffffffff810eba17>] rcu_barrier_sched+0x17/0x20
 [<ffffffff810eba29>] rcu_barrier+0x9/0x10
 [<ffffffff81163296>] deactivate_locked_super+0x56/0x70
 [<ffffffff81163d45>] deactivate_super+0x45/0x60
 [<ffffffff8117d51a>] mntput_no_expire+0x12a/0x170
 [<ffffffff8117e269>] sys_umount+0x79/0x3a0
 [<ffffffff81b8cf7b>] system_call_fastpath+0x16/0x1b

I also kept the result of SysRq T (stack traces for all tasks) in case that's not enough.

@martinhelmer
Copy link

I have most likely ran into the the same issue as this one; unable to export, not even with -f.
My system is currently in that state, and if you are quick to advise, I can provide whatever diagnosis that you request.

@behlendorf
Copy link
Contributor

@martinhelmer Thanks for the comment but I think we've got what we need. It just needs someone with time to work on it.

behlendorf added a commit to behlendorf/zfs that referenced this issue Jul 16, 2013
The iterate_supers_type() function which was introduced in the
3.0 kernel was supposed to provide a safe way to call an arbitrary
function on all super blocks of a specific type.  Unfortunately,
because a list_head was used a bug was introduced which made it
possible for iterate_supers_type() to get stuck spinning on a
super block which was just deactivated.

The bug was fixed in the 3.3 kernel by converting the list_head
to an hlist_node.  However, to resolve the issue for existing
3.0 - 3.2 kernels we detect when a list_head is used.  Then to
prevent the spinning from occurring the .next pointer is set to
the fs_supers list_head which ensures the iterate_supers_type()
function will always terminate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#1045
Issue openzfs#861
Issue openzfs#790
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
The iterate_supers_type() function which was introduced in the
3.0 kernel was supposed to provide a safe way to call an arbitrary
function on all super blocks of a specific type.  Unfortunately,
because a list_head was used a bug was introduced which made it
possible for iterate_supers_type() to get stuck spinning on a
super block which was just deactivated.

This can occur because when the list head is removed from the
fs_supers list it is reinitialized to point to itself.  If the
iterate_supers_type() function happened to be processing the
removed list_head it will get stuck spinning on that list_head.

The bug was fixed in the 3.3 kernel by converting the list_head
to an hlist_node.  However, to resolve the issue for existing
3.0 - 3.2 kernels we detect when a list_head is used.  Then to
prevent the spinning from occurring the .next pointer is set to
the fs_supers list_head which ensures the iterate_supers_type()
function will always terminate.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1045
Closes openzfs#861
Closes openzfs#790
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants