Kernel stall during resilver #3936

alexanderhaensch · 2015-10-19T07:21:31Z

system was completly lock up.
CPUs: 2
Memory: 128GB
VM/Hypervisor: no
ECC mem: yes
Distribution: Gentoo GNU/Linux
Kernel version: Linux eos 3.14.51-hardened #1 SMP Wed Sep 16 11:13:14 CEST 2015 x86_64 Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz GenuineIntel GNU/Linux
SPL/ZFS source: Gentoo Packages
SPL/ZFS version: [ 26.396886] SPL: Loaded module v0.6.5.2-r0-gentoo (DEBUG mode)
[ 26.606759] ZFS: Loaded module v0.6.5.2-r0-gentoo (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

Short description: Removing large number of files caused the system to hang
[ 26.396886] SPL: Loaded module v0.6.5.2-r0-gentoo (DEBUG mode)
[ 26.606759] ZFS: Loaded module v0.6.5.2-r0-gentoo (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5

SLAB allocator: SLUB

INFO: rcu_sched self-detected stall on CPU { 22} (t=2100 jiffies g=47679397 c=47679396 q=171016)
sending NMI to all CPUs:
I think there is a maximal post lenght here, showing only the stalled core.

INFO: rcu_sched self-detected stall on CPU { 22} (t=2100 jiffies g=47679397 c=47679396 q=171016)
sending NMI to all CPUs:
--- snip ---
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0b 06/30/2014
CPU: 22 PID: 1856 Comm: dbu_evict Tainted: P O 3.14.51-hardened #1
NMI backtrace for cpu 22
Code: 55 b9 01 ff ff ff 48 8b 07 48 89 e5 48 8b 57 08 48 39 c8 74 22 b9 02 ff ff ff 48 39 ca 74 54 48 8b 12 48 39 d7 75 39 48 8b 50 08 <48> 39 d7 75 1d b8 01 00 00 00 5d c3 48 89 c2 48 89 fe 31 c0 48
[] ? system_call_fastpath+0x16/0x1b
[] stub_clone+0x65/0x90
[] SyS_clone+0x11/0x20
[] ? sigprocmask+0x4f/0x80
[] ? __set_current_blocked+0x31/0x50
[] do_fork+0xcb/0x310
[] ? recalc_sigpending+0x16/0x50
[] ? __do_page_fault+0x1dc/0x500
[] copy_process.part.44+0x168/0x1880
[] arch_dup_task_struct+0xb9/0x110
[] kmem_cache_alloc+0x9b/0x130
[] ? arch_dup_task_struct+0xb9/0x110
[] ? unlock_page+0x1e/0x30
[] ? arch_dup_task_struct+0xb9/0x110
[] __slab_alloc+0x2bf/0x4ad
[] new_slab+0x275/0x300
[] alloc_pages_current+0xa3/0x170
[] __alloc_pages_nodemask+0x56c/0xa00
[] try_to_free_pages+0xb7/0xd0
[] do_try_to_free_pages+0x421/0x550
[] shrink_slab+0x83/0x150
[] shrink_slab_node+0x112/0x1b0
[] spl_kmem_cache_generic_shrinker_scan_objects+0xd/0x30 [spl]
[] __spl_kmem_cache_generic_shrinker.isra.12+0x9d/0x120 [spl]
[] spl_kmem_cache_reap_now+0x13c/0x1d0 [spl]
[] kmem_cache_shrink+0x138/0x250
[] __list_del_entry+0xd/0x30
Call Trace:
ffff882027a35000 0000000100000002 ffff88202628a610 ffff881c38b60e40
ffffffff8110a958 ffff881c38b60e60 000000018107d20f 0000000000000246
ffff881fe9e73848 ffffffff8137723d ffff881c38b60e50 ffff881fe9e738b8
Stack:
CR2: 00007f4ff65a6310 CR3: 00000010099f4000 CR4: 00000000001607f0
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
FS: 00007f4ff720d700(0000) GS:ffff88207fd20000(0000) knlGS:0000000000000000
R13: ffffea0048530e00 R14: ffffea006180b020 R15: ffff88202628a600
R10: ffffea0070e2d800 R11: ffff88103f803c00 R12: ffffea006180b000
RBP: ffff881fe9e73830 R08: ffffffff818a7fd0 R09: ffff88207fd2fd40
RDX: ffffea006180b020 RSI: 0000000000000010 RDI: ffffea006180b020
RAX: ffffea0048530e20 RBX: ffffea006180b020 RCX: 00000000ffffff02
RSP: 0018:ffff881fe9e73830 EFLAGS: 00000046
RIP: 0010:[] [] __list_del_entry_debug+0x2b/0x90
task: ffff881ffe41e6c0 ti: ffff881ffe41ec20 task.ti: ffff881ffe41ec20

---snap ---

tuxoko · 2015-10-21T18:36:30Z

@alexanderhaensch
Do you have a complete log?
The log said stall on CPU 22, but the stack trace of CPU 22 is not shown.

alexanderhaensch · 2015-10-21T19:12:09Z

@tuxoko i updated the post. The resilver is now completed, but i can say that the memory usage was huge during the resilver.
The full backtrace can be found here: https://gist.github.com/alexanderhaensch/45a2efbd57a832b7fcd0
We use the default SLAB allocator (unqueued SLAB allocator (SLUB)).

tuxoko · 2015-10-22T18:22:48Z

There's huge contention in the slub.

@behlendorf I see the spl kmem shrinker will shrink linux slab. But I don't think this is needed. __slab_free seems to automatically reclaim a slab if it becomes empty and partial slabs is more than min_partial. Also I don't see any kernel slab registers a shrinker. If we remove the shrinking, the contention in this issue should not occur.

alexanderhaensch · 2015-10-22T18:47:59Z

@tuxoko is it better to use SLAB instead of SLUB? I heard that SLUB is more performant than SLAB.

tuxoko · 2015-10-22T18:51:53Z

@alexanderhaensch
I use slab as a general term for sl[auo]b.

Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see openzfs/zfs#3936) Signed-off-by: Chunwei Chen <david.chen@osnexus.com>

tuxoko · 2015-10-24T00:36:52Z

@alexanderhaensch
Could you try openzfs/spl#487 and see if it works better?
Thanks.

alexanderhaensch · 2015-10-28T09:05:52Z

I installed the patch with 0.6.5.3 and started a new resilver. Lets wait...

alexanderhaensch · 2015-11-08T18:53:05Z

This issue is solved by openzfs/spl#487 .

Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see openzfs/zfs#3936) Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#3936 Closes #487

Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see openzfs/zfs#3936) Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#3936 Closes openzfs#487

Linux slab will automatically free empty slab when number of partial slab is over min_partial, so we don't need to explicitly shrink it. In fact, calling kmem_cache_shrink from shrinker will cause heavy contention on kmem_cache_node->list_lock, to the point that it might cause __slab_free to livelock (see openzfs/zfs#3936) Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#3936 Closes #487

behlendorf added the Bug - Major label Oct 20, 2015

behlendorf added this to the 0.7.0 milestone Oct 20, 2015

tuxoko mentioned this issue Oct 24, 2015

Don't call kmem_cache_shrink from shrinker openzfs/spl#487

Closed

alexanderhaensch closed this as completed Nov 8, 2015

tuxoko mentioned this issue Jan 4, 2016

Include 3e7e6f3 for 0.6.5.4 openzfs/spl#517

Closed

behlendorf modified the milestones: 0.6.5.5, 0.7.0 Mar 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel stall during resilver #3936

Kernel stall during resilver #3936

alexanderhaensch commented Oct 19, 2015

tuxoko commented Oct 21, 2015

alexanderhaensch commented Oct 21, 2015

tuxoko commented Oct 22, 2015

alexanderhaensch commented Oct 22, 2015

tuxoko commented Oct 22, 2015

tuxoko commented Oct 24, 2015

alexanderhaensch commented Oct 28, 2015

alexanderhaensch commented Nov 8, 2015

Kernel stall during resilver #3936

Kernel stall during resilver #3936

Comments

alexanderhaensch commented Oct 19, 2015

tuxoko commented Oct 21, 2015

alexanderhaensch commented Oct 21, 2015

tuxoko commented Oct 22, 2015

alexanderhaensch commented Oct 22, 2015

tuxoko commented Oct 22, 2015

tuxoko commented Oct 24, 2015

alexanderhaensch commented Oct 28, 2015

alexanderhaensch commented Nov 8, 2015