-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: soft lockup - CPU#5 stuck for 67s! [kswapd0:260] #1482
Comments
I experienced something similar.
|
Thanks for the information - I have added a link to your bug from my bug #1515 report. Good that you have a stack trace - I wasn't able to get one as it was headless and completely locked. Perhaps I need to hookup a screen and keyboard but it isn't even making it to the logs on my system. Matthew |
I started experiencing these hangs after I updated the Ubuntu kernel to 3.2.0-45. Now I update to 3.2.0-48 and so far the issues have not reappeared. It might be unrelated, just a possible hint. |
@lukasz99 Your issue is just contention of the virtual address space lock which is a known issue. The stacks dumped are just informational and as long as they only occur once and a while not harmful. We're looking in to a proper long term fix to address this. @zrav Your stack suggests a possible deadlock in zfs_zinactive() on the zp->z_lock during memory reclaim. Unfortunately, from the stacks you posted it's not clear what task is holding that lock. It's possible there was kernel bug here but I can't say for sure, if it occurs again please post the stacks. |
Upgrading to kernel 3.8.0-30 the hangs returned (now on ZoL 0.6.2): |
I think I have the same issue (first post):
And a dead kswapd process using 100% CPU load ( -> one core ) Here is a earlier one:
and again a dead kswapd. These issues came up ahter we where adding SSD for l2arc cache. System information:
CentOS release 6.4 (Final), with ZFS 0.6.3 We do lots of zfs sync to our mirror. We had not these issues without the l2arc cache ssds If you need more informa I will do my very best. |
Running Debian Wheezy/sid hybrid here with zfsonlinux 2
I have had 4 soft locks over about a month of the server running, all affecting kswapd eg:
The other traces are very similar |
My problem was solved by updating to git-head. There has been a huge amount Matthew On 9 March 2014 03:16, zeigerpuppy notifications@github.com wrote:
|
All things considered, the server is bleeding edge enough! |
@lukasz99 @zrav @raketentimo @zeigerpuppy according to @mattaw, this is fixed in latest HEAD. Could you confirm? |
This issue is understood and was significantly improved in 0.6.3 and newer. Additional improvement plans which should improve things further. |
Hello,
I've noticed just two of these after running the system for a couple of days. the
box seems to be relatively happy - no other strange symptoms so far other than
system load on what otherwise should be an idle system staying at 3.0 since the
time the bugs got reported.
Guessing by the contents of the stack traces it looks like some sort of problem
with SPL/ARC...
Below's what I've recovered from dmesg - it looks like the second kswapd0 lockup
got followed by:
INFO: task arc_adapt:2143 blocked for more than 120 seconds.
In case it matters, I'm running the most recent version of Centos 6.4 modified to
run openvz (2.6.32-042stab076.8 kernel)
lukasz
[181067.234355] BUG: soft lockup - CPU#5 stuck for 67s! [kswapd0:260]
[181067.234428] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 ext3 jbd zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode sb_edac edac_core i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma igb dca shpchp ext4 mbcache jbd2 raid1 isci libsas scsi_transport_sas sr_mod cdrom sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[181067.234481] CPU 5
[181067.234482] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 ext3 jbd zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode sb_edac edac_core i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma igb dca shpchp ext4 mbcache jbd2 raid1 isci libsas scsi_transport_sas sr_mod cdrom sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[181067.234516]
[181067.234519] Pid: 260, comm: kswapd0 veid: 0 Tainted: P C --------------- 2.6.32-042stab076.8 #1 042stab076_8 Supermicro X9DR3-F/X9DR3-F
[181067.234523] RIP: 0010:[] [] remove_vm_area+0x86/0xa0
[181067.234529] RSP: 0018:ffff881076e61ac0 EFLAGS: 00000287
[181067.234531] RAX: ffff88007b8be3c0 RBX: ffff881076e61ae0 RCX: ffff8810393d2e00
[181067.234533] RDX: ffff8808d9249700 RSI: 0000000000000001 RDI: ffffffff81ac37d0
[181067.234535] RBP: ffffffff8100bc4e R08: 0000000000000000 R09: ffff88007baff800
[181067.234536] R10: ffff88007baff800 R11: ffff88007baff600 R12: ffff88107fc45000
[181067.234538] R13: ffff88107fc02500 R14: ffff881079f36e18 R15: ffff881079f36e00
[181067.234540] FS: 0000000000000000(0000) GS:ffff880069b40000(0000) knlGS:0000000000000000
[181067.234542] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[181067.234544] CR2: 00007fe41568a000 CR3: 0000000001a85000 CR4: 00000000000406e0
[181067.234546] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[181067.234548] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[181067.234550] Process kswapd0 (pid: 260, veid: 0, threadinfo ffff881076e60000, task ffff881076e3d060)
[181067.234552] Stack:
[181067.234553] ffff88007b9dacc0 ffff88007b8be3c0 ffffc90ec6457000 ffffc913b2cf2000
[181067.234558] ffff881076e61b10 ffffffff8116b5de 0000000000000000 ffff881078070000
[181067.234562] ffffc913b2cb0000 0000000000000000 ffff881076e61b20 ffffffff8116b74f
[181067.234566] Call Trace:
[181067.234569] [] ? __vunmap+0x2e/0x120
[181067.234572] [] ? vfree+0x2f/0x40
[181067.234582] [] ? kv_free+0x65/0x70 [spl]
[181067.234587] [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
[181067.234592] [] ? __switch_to+0xd0/0x320
[181067.234598] [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
[181067.234603] [] ? spl_kmem_cache_reap_now+0x157/0x230 [spl]
[181067.234623] [] ? arc_kmem_reap_now+0x67/0xc0 [zfs]
[181067.234637] [] ? arc_shrinker_func+0xdf/0x1e0 [zfs]
[181067.234642] [] ? shrink_slab+0x168/0x1e0
[181067.234645] [] ? balance_pgdat+0x7fd/0xb40
[181067.234648] [] ? kswapd+0x181/0x3f0
[181067.234652] [] ? autoremove_wake_function+0x0/0x40
[181067.234655] [] ? kswapd+0x0/0x3f0
[181067.234658] [] ? kthread+0x96/0xa0
[181067.234660] [] ? child_rip+0xa/0x20
[181067.234663] [] ? kthread+0x0/0xa0
[181067.234665] [] ? child_rip+0x0/0x20
[181067.234667] Code: d0 37 ac 81 48 89 45 e8 e8 48 92 38 00 48 8b 15 a9 f6 25 01 48 c7 c1 98 ab 3c 82 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 b4 82 95 00 00 00 00
[181067.234686] Call Trace:
[181067.234688] [] ? __vunmap+0x2e/0x120
[181067.234691] [] ? vfree+0x2f/0x40
[181067.234696] [] ? kv_free+0x65/0x70 [spl]
[181067.234701] [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
[181067.234705] [] ? __switch_to+0xd0/0x320
[181067.234710] [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
[181067.234716] [] ? spl_kmem_cache_reap_now+0x157/0x230 [spl]
[181067.234728] [] ? arc_kmem_reap_now+0x67/0xc0 [zfs]
[181067.234741] [] ? arc_shrinker_func+0xdf/0x1e0 [zfs]
[181067.234745] [] ? shrink_slab+0x168/0x1e0
[181067.234748] [] ? balance_pgdat+0x7fd/0xb40
[181067.234751] [] ? kswapd+0x181/0x3f0
[181067.234754] [] ? autoremove_wake_function+0x0/0x40
[181067.234757] [] ? kswapd+0x0/0x3f0
[181067.234759] [] ? kthread+0x96/0xa0
[181067.234761] [] ? child_rip+0xa/0x20
[181067.234764] [] ? kthread+0x0/0xa0
[181067.234766] [] ? child_rip+0x0/0x20
[181151.111272] BUG: soft lockup - CPU#5 stuck for 67s! [kswapd0:260]
[181151.111343] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 ext3 jbd zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode sb_edac edac_core i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma igb dca shpchp ext4 mbcache jbd2 raid1 isci libsas scsi_transport_sas sr_mod cdrom sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[181151.111396] CPU 5
[181151.111397] Modules linked in: vzethdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vzdquota ip6t_REJECT ip6table_mangle ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables vzevent autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf vznetdev vzmon vzdev ipv6 ext3 jbd zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate microcode sb_edac edac_core i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma igb dca shpchp ext4 mbcache jbd2 raid1 isci libsas scsi_transport_sas sr_mod cdrom sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[181151.111431]
[181151.111433] Pid: 260, comm: kswapd0 veid: 0 Tainted: P C --------------- 2.6.32-042stab076.8 #1 042stab076_8 Supermicro X9DR3-F/X9DR3-F
[181151.111437] RIP: 0010:[] [] remove_vm_area+0x86/0xa0
[181151.111444] RSP: 0018:ffff881076e61ac0 EFLAGS: 00000283
[181151.111446] RAX: ffff88007da6ce40 RBX: ffff881076e61ae0 RCX: ffff88062ccb5d00
[181151.111448] RDX: ffff8806678a2a80 RSI: 0000000000000001 RDI: ffffffff81ac37d0
[181151.111450] RBP: ffffffff8100bc4e R08: 0000000000000000 R09: 0000000000000000
[181151.111451] R10: 0000000000000000 R11: 0000000000000000 R12: ffff881076e61a90
[181151.111453] R13: ffff881079f0b640 R14: ffff881080011640 R15: ffffea0074dbff40
[181151.111455] FS: 0000000000000000(0000) GS:ffff880069b40000(0000) knlGS:0000000000000000
[181151.111458] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[181151.111459] CR2: 00007fe41568a000 CR3: 0000000001a85000 CR4: 00000000000406e0
[181151.111461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[181151.111463] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[181151.111465] Process kswapd0 (pid: 260, veid: 0, threadinfo ffff881076e60000, task ffff881076e3d060)
[181151.111467] Stack:
[181151.111468] ffff88007df6a8c0 ffff88007da6ce40 ffffc90ec6457000 ffffc911b2542000
[181151.111472] ffff881076e61b10 ffffffff8116b5de 0000000000000000 ffff881078070000
[181151.111476] ffffc911b2500000 0000000000000000 ffff881076e61b20 ffffffff8116b74f
[181151.111480] Call Trace:
[181151.111484] [] ? __vunmap+0x2e/0x120
[181151.111486] [] ? vfree+0x2f/0x40
[181151.111495] [] ? kv_free+0x65/0x70 [spl]
[181151.111501] [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
[181151.111505] [] ? __switch_to+0xd0/0x320
[181151.111511] [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
[181151.111516] [] ? spl_kmem_cache_reap_now+0x157/0x230 [spl]
[181151.111536] [] ? arc_kmem_reap_now+0x67/0xc0 [zfs]
[181151.111550] [] ? arc_shrinker_func+0xdf/0x1e0 [zfs]
[181151.111555] [] ? shrink_slab+0x168/0x1e0
[181151.111558] [] ? balance_pgdat+0x7fd/0xb40
[181151.111561] [] ? kswapd+0x181/0x3f0
[181151.111565] [] ? autoremove_wake_function+0x0/0x40
[181151.111567] [] ? kswapd+0x0/0x3f0
[181151.111570] [] ? kthread+0x96/0xa0
[181151.111573] [] ? child_rip+0xa/0x20
[181151.111575] [] ? kthread+0x0/0xa0
[181151.111577] [] ? child_rip+0x0/0x20
[181151.111579] Code: d0 37 ac 81 48 89 45 e8 e8 48 92 38 00 48 8b 15 a9 f6 25 01 48 c7 c1 98 ab 3c 82 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 b4 82 95 00 00 00 00
[181151.111598] Call Trace:
[181151.111600] [] ? __vunmap+0x2e/0x120
[181151.111603] [] ? vfree+0x2f/0x40
[181151.111608] [] ? kv_free+0x65/0x70 [spl]
[181151.111614] [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl]
[181151.111617] [] ? __switch_to+0xd0/0x320
[181151.111622] [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl]
[181151.111628] [] ? spl_kmem_cache_reap_now+0x157/0x230 [spl]
[181151.111641] [] ? arc_kmem_reap_now+0x67/0xc0 [zfs]
[181151.111653] [] ? arc_shrinker_func+0xdf/0x1e0 [zfs]
[181151.111657] [] ? shrink_slab+0x168/0x1e0
[181151.111660] [] ? balance_pgdat+0x7fd/0xb40
[181151.111663] [] ? kswapd+0x181/0x3f0
[181151.111666] [] ? autoremove_wake_function+0x0/0x40
[181151.111669] [] ? kswapd+0x0/0x3f0
[181151.111671] [] ? kthread+0x96/0xa0
[181151.111674] [] ? child_rip+0xa/0x20
[181151.111676] [] ? kthread+0x0/0xa0
[181151.111679] [] ? child_rip+0x0/0x20
[181175.794555] INFO: task arc_adapt:2143 blocked for more than 120 seconds.
[181175.794644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[181175.794792] arc_adapt D ffff882075fbc6c0 0 2143 2 0 0x00000000
[181175.794801] ffff8820780a3d60 0000000000000046 ffff8820780a3ce0 ffffffff8108033c
[181175.794811] ffff8820780a3d60 00000000ffffffff ffff88207946c000 ffffffff8100bc4e
[181175.794819] ffff8820780a3d60 0000000000000000 ffff882075fbcc88 000000000001e9c0
[181175.794828] Call Trace:
[181175.794842] [] ? lock_timer_base+0x3c/0x70
[181175.794851] [] ? apic_timer_interrupt+0xe/0x20
[181175.794877] [] __mutex_lock_slowpath+0x13e/0x180
[181175.794887] [] mutex_lock+0x2b/0x50
[181175.794910] [] __cv_timedwait_common+0xc7/0x250 [spl]
[181175.794917] [] ? apic_timer_interrupt+0xe/0x20
[181175.794925] [] ? autoremove_wake_function+0x0/0x40
[181175.794968] [] ? arc_adapt_thread+0x0/0xd0 [zfs]
[181175.794984] [] __cv_timedwait_interruptible+0x13/0x20 [spl]
[181175.795014] [] arc_adapt_thread+0x9f/0xd0 [zfs]
[181175.795028] [] thread_generic_wrapper+0x68/0x80 [spl]
[181175.795042] [] ? thread_generic_wrapper+0x0/0x80 [spl]
[181175.795048] [] kthread+0x96/0xa0
[181175.795056] [] child_rip+0xa/0x20
[181175.795062] [] ? kthread+0x0/0xa0
[181175.795067] [] ? child_rip+0x0/0x20
The text was updated successfully, but these errors were encountered: