Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU soft lockup on zfs destroy, umount #3743

Closed
aither64 opened this issue Sep 6, 2015 · 1 comment
Closed

CPU soft lockup on zfs destroy, umount #3743

aither64 opened this issue Sep 6, 2015 · 1 comment

Comments

@aither64
Copy link

aither64 commented Sep 6, 2015

Hi,

command zfs destroy vz/private/4511 resulted in CPU soft lockup:

[702562.514300] BUG: soft lockup - CPU#3 stuck for 67s! [umount:158045]
[702562.514380] Modules linked in: ts_bm xt_string xt_time xt_connlimit xt_realm xt_NFQUEUE xt_pkttype xt_TPROXY nf_tproxy_core xt_CLASSIFY xt_CONNMARK xt_MARK xt_hashlimit xt_comment xt_connmark xt_owner xt_recent xt_iprange xt_physdev xt_policy ipt_MASQUERADE iptable_raw ipt_addrtype sch_sfq cls_u32 sch_htb vzethdev pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vziolimit vzdquota veth bridge xfrm6_mode_tunnel xfrm4_mode_tunnel esp6 esp4 af_key ip6table_mangle ip6t_REJECT vzrst vzcpt nfs fscache ip6_queue nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6t_ah ip6t_frag ip6t_hbh ip6t_ipv6header ip6t_LOG ip6t_rt ip6_tables ipt_REDIRECT nf_nat_irc nf_nat_ftp iptable_nat nf_nat xt_helper xt_state xt_conntrack nf_conntrack_irc nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xt_length ipt_LOG xt_hl xt_tcpmss xt_TCPMSS ipt_REJECT xt_DSCP xt_dscp xt_multiport xt_limit iptable_mangle iptable_filter ip_tables fuse tun arc4 ecb ppp_mppe ppp_deflate ppp_async ppp_generic slhc crc_ccitt vzevent nfsd lockd nfs_acl auth_rpcgss sunrpc coretemp vznetdev vzmon vzdev bonding 8021q garp stp llc ipv6 microcode iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler acpi_pad zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ptp pps_core sg ioatdma dca ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci isci libsas scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[702562.514511] CPU 3 
[702562.514512] Modules linked in: ts_bm xt_string xt_time xt_connlimit xt_realm xt_NFQUEUE xt_pkttype xt_TPROXY nf_tproxy_core xt_CLASSIFY xt_CONNMARK xt_MARK xt_hashlimit xt_comment xt_connmark xt_owner xt_recent xt_iprange xt_physdev xt_policy ipt_MASQUERADE iptable_raw ipt_addrtype sch_sfq cls_u32 sch_htb vzethdev pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vziolimit vzdquota veth bridge xfrm6_mode_tunnel xfrm4_mode_tunnel esp6 esp4 af_key ip6table_mangle ip6t_REJECT vzrst vzcpt nfs fscache ip6_queue nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6t_ah ip6t_frag ip6t_hbh ip6t_ipv6header ip6t_LOG ip6t_rt ip6_tables ipt_REDIRECT nf_nat_irc nf_nat_ftp iptable_nat nf_nat xt_helper xt_state xt_conntrack nf_conntrack_irc nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xt_length ipt_LOG xt_hl xt_tcpmss xt_TCPMSS ipt_REJECT xt_DSCP xt_dscp xt_multiport xt_limit iptable_mangle iptable_filter ip_tables fuse tun arc4 ecb ppp_mppe ppp_deflate ppp_async ppp_generic slhc crc_ccitt vzevent nfsd lockd nfs_acl auth_rpcgss sunrpc coretemp vznetdev vzmon vzdev bonding 8021q garp stp llc ipv6 microcode iTCO_wdt iTCO_vendor_support ipmi_si ipmi_msghandler acpi_pad zfs(P)(U) zcommon(P)(U) znvpair(P)(U) spl(U) zlib_deflate zavl(P)(U) zunicode(P)(U) sb_edac edac_core i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core ptp pps_core sg ioatdma dca ext4 jbd2 mbcache raid1 sd_mod crc_t10dif ahci isci libsas scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[702562.514586] 
[702562.514589] Pid: 158045, comm: umount veid: 0 Tainted: P           ---------------    2.6.32-042stab108.8 #1 042stab108_8 Supermicro X9DRFR/X9DRFR
[702562.514594] RIP: 0010:[<ffffffff81539857>]  [<ffffffff81539857>] _spin_unlock_irqrestore+0x17/0x20
[702562.514603] RSP: 0018:ffff881be3417d08  EFLAGS: 00000286
[702562.514605] RAX: 0000000000000286 RBX: ffff881be3417d08 RCX: 0000000000003459
[702562.514607] RDX: ffff88207873f180 RSI: 0000000000000286 RDI: 0000000000000286
[702562.514610] RBP: ffffffff8100bc4e R08: ffff8820388f56d8 R09: 0000000000000000
[702562.514612] R10: 0000000000000000 R11: 00000000ffffffff R12: ffff88105ad1e298
[702562.514614] R13: 0000000001b655b7 R14: ffff8803cfb505c0 R15: ffff8803ba5fe300
[702562.514617] FS:  00007f745f26c740(0000) GS:ffff880069cc0000(0000) knlGS:0000000000000000
[702562.514620] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[702562.514623] CR2: 00000000011ea000 CR3: 000000045ddf5000 CR4: 00000000000407e0
[702562.514627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[702562.514630] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[702562.514634] Process umount (pid: 158045, veid: 0, threadinfo ffff881be3416000, task ffff8803cfb505c0)
[702562.514637] Stack:
[702562.514638]  ffff881be3417d28 ffffffffa02255bc 0000000000000000 ffff88207873f180
[702562.514641] <d> ffff881be3417d88 ffffffffa02256ef ffff88207873f180 0000000000000286
[702562.514644] <d> ffff88103a298800 ffffffffffffff10 ffffffffa0296afd 0000000000000010
[702562.514648] Call Trace:
[702562.514660]  [<ffffffffa02255bc>] ? taskq_wait_outstanding_check+0x3c/0x50 [spl]
[702562.514669]  [<ffffffffa02256ef>] ? taskq_wait_outstanding+0x2f/0xc0 [spl]
[702562.514706]  [<ffffffffa0296afd>] ? dmu_objset_pool+0x1d/0x40 [zfs]
[702562.514743]  [<ffffffffa031b81c>] ? zfs_sb_teardown+0x5c/0x350 [zfs]
[702562.514775]  [<ffffffffa031bb73>] ? zfs_umount+0x43/0x110 [zfs]
[702562.514806]  [<ffffffffa033a796>] ? zpl_put_super+0x36/0x50 [zfs]
[702562.514813]  [<ffffffff811b7d5b>] ? generic_shutdown_super+0x7b/0x100
[702562.514817]  [<ffffffff811b7e70>] ? kill_anon_super+0x40/0x80
[702562.514846]  [<ffffffffa033a49e>] ? zpl_kill_sb+0x1e/0x30 [zfs]
[702562.514849]  [<ffffffff811b8339>] ? deactivate_super+0x79/0xa0
[702562.514853]  [<ffffffff811d9d7f>] ? mntput_no_expire+0xbf/0x110
[702562.514856]  [<ffffffff811daa02>] ? sys_umount+0x82/0x3d0
[702562.514860]  [<ffffffff8100b122>] ? system_call_fastpath+0x16/0x1b
[702562.514862] Code: 00 00 00 01 74 05 e8 09 fe d6 ff c9 c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 48 89 fa 66 ff 02 66 66 90 48 89 f7 57 9d <0f> 1f 44 00 00 c9 c3 66 90 55 48 89 e5 0f 1f 44 00 00 f0 ff 07 
[702562.514885] Call Trace:
[702562.514894]  [<ffffffffa02255bc>] ? taskq_wait_outstanding_check+0x3c/0x50 [spl]
[702562.514902]  [<ffffffffa02256ef>] ? taskq_wait_outstanding+0x2f/0xc0 [spl]
[702562.514925]  [<ffffffffa0296afd>] ? dmu_objset_pool+0x1d/0x40 [zfs]
[702562.514954]  [<ffffffffa031b81c>] ? zfs_sb_teardown+0x5c/0x350 [zfs]
[702562.514983]  [<ffffffffa031bb73>] ? zfs_umount+0x43/0x110 [zfs]
[702562.515012]  [<ffffffffa033a796>] ? zpl_put_super+0x36/0x50 [zfs]
[702562.515017]  [<ffffffff811b7d5b>] ? generic_shutdown_super+0x7b/0x100
[702562.515021]  [<ffffffff811b7e70>] ? kill_anon_super+0x40/0x80
[702562.515049]  [<ffffffffa033a49e>] ? zpl_kill_sb+0x1e/0x30 [zfs]
[702562.515052]  [<ffffffff811b8339>] ? deactivate_super+0x79/0xa0
[702562.515055]  [<ffffffff811d9d7f>] ? mntput_no_expire+0xbf/0x110
[702562.515057]  [<ffffffff811daa02>] ? sys_umount+0x82/0x3d0
[702562.515060]  [<ffffffff8100b122>] ? system_call_fastpath+0x16/0x1b

The processes look like this:

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      157994  0.0  0.0 127668  1704 ?        S    10:39   0:00 zfs destroy vz/private/4511
root      158045  0.0  0.0 105184   804 ?        R    10:39   0:00 /bin/umount -t zfs /vz/private/4511
[root@node1.pgnd.vpsfree.cz]
 ~ # cat /proc/157994/stack
[<ffffffff81081b64>] do_wait+0x1e4/0x240
[<ffffffff81081c48>] sys_wait4+0x88/0xd0
[<ffffffff8100b122>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

[root@node1.pgnd.vpsfree.cz]
 ~ # cat /proc/158045/stack 
[<ffffffffffffffff>] 0xffffffffffffffff

We're running at version either 0.6.4 or https://github.com/vpsfreecz/zfs, I'm not sure at the moment and will try to specify later.

We found no other solution but reset, because more and more processes are stuck in an uninterruptible state. So far it happened multiple times within several weeks/months and only on this one particular server (though on different datasets) out of many.

@behlendorf
Copy link
Contributor

@aither64 thanks for filing this. I'm marking this a duplicate of #3508 which has reduced in frequency in the latest code but the root cause hasn't been precisely identified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants