Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs send hangs (kernel panic BUG) #2339

Closed
SenH opened this issue May 16, 2014 · 2 comments
Closed

zfs send hangs (kernel panic BUG) #2339

SenH opened this issue May 16, 2014 · 2 comments
Milestone

Comments

@SenH
Copy link
Contributor

SenH commented May 16, 2014

I'm copying my degraded raidz1 pool to a mirror via a screen session.

root@kubrick:~$ zfs send -R tanker@copier1 | mbuffer -s 128k -m 2G -o - | zfs recv -u -v -F -d backup
receiving full stream of tanker@copier1 into backup@copier1
in @  139 MB/s, out @  0.0 kB/s,  256 kB total, buffer   7% full
received 225KB stream in 1 seconds (225KB/sec)
receiving full stream of tanker/www@copier1 into backup/www@copier1
in @  123 MB/s, out @  0.0 kB/s,  384 kB total, buffer  16% full
received 168KB stream in 2 seconds (84.2KB/sec)
receiving full stream of tanker/pictures@copier1 into backup/pictures@copier1
in @  0.0 kB/s, out @  0.0 kB/s, 36.2 GB total, buffer 100% full
received 36.2GB stream in 4195 seconds (8.83MB/sec)
receiving full stream of tanker/downloads@copier1 into backup/downloads@copier1
in @  0.0 kB/s, out @  0.0 kB/s,  149 GB total, buffer 100% full

zfs send is hanging now

sen@kubrick:~$ ps ax | grep -P "\s+D"
  382 ?        D<     2:02 [spl_system_task]
 5178 pts/1    D+     1:11 zfs send -R tanker copier1

dmesg

[22728.153141] ------------[ cut here ]------------
[22728.153188] kernel BUG at /build/buildd/linux-lts-quantal-3.5.0/mm/vmalloc.c:1484!
[22728.153251] invalid opcode: 0000 [#1] SMP 
[22728.153291] CPU 1 
[22728.153299] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler ctr ccm nf_conntrack_netlink nfnetlink sit tunnel4 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter arc4 xt_TCPMSS ip6table_mangle ip6_tables xt_pkttype xt_LOG xt_limit ipt_REJECT xt_state iptable_filter ipt_MASQUERADE xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw ip_tables x_tables ath9k_htc(O) kvm_amd kvm microcode 8021q garp stp mac80211(O) llc ath9k_common(O) ath9k_hw(O) ath(O) amd64_edac_mod cfg80211(O) sp5100_tco edac_core i2c_piix4 edac_mce_amd k10temp joydev compat(O) mac_hid shpchp lp ext2 parport zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate ses enclosure hid_generic usbhid tg3 hid ahci libahci arcsas(O)
[22728.153932] 
[22728.153961] Pid: 26, comm: kswapd0 Tainted: P           O 3.5.0-46-generic #70~precise1-Ubuntu HP ProLiant MicroServer
[22728.154034] RIP: 0010:[<ffffffff8115f657>]  [<ffffffff8115f657>] __vunmap.part.16+0x97/0xc0
[22728.154106] RSP: 0018:ffff8802100cfb10  EFLAGS: 00010246
[22728.154144] RAX: ffff8801da591800 RBX: ffffc90026765000 RCX: ffff880000000000
[22728.154186] RDX: 0000000000000001 RSI: 0000000000000100 RDI: 0000000000000000
[22728.154227] RBP: ffff8802100cfb30 R08: 0000000000000002 R09: 0000000000000004
[22728.154267] R10: 0000000000000004 R11: 0000000000000002 R12: ffff8801a7cc9200
[22728.154308] R13: 0000000000000100 R14: ffff88020fd1c840 R15: ffffc90026855000
[22728.154351] FS:  00007fedd282eb80(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
[22728.155361] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[22728.155399] CR2: 0000000001913000 CR3: 0000000004123000 CR4: 00000000000007e0
[22728.155441] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[22728.155482] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[22728.155525] Process kswapd0 (pid: 26, threadinfo ffff8802100ce000, task ffff8802100d1700)
[22728.155585] Stack:
[22728.155615]  ffff8802103f7000 ffff8802100cfbb0 ffff88020fd1c840 ffff88020fd1c000
[22728.155685]  ffff8802100cfb40 ffffffff8115f69b ffff8802100cfb50 ffffffff8115f7ca
[22728.155755]  ffff8802100cfb60 ffffffffa00c7c68 ffff8802100cfc10 ffffffffa00c95d5
[22728.155825] Call Trace:
[22728.155862]  [<ffffffff8115f69b>] __vunmap+0x1b/0x40
[22728.155903]  [<ffffffff8115f7ca>] vfree+0x2a/0x30
[22728.155960]  [<ffffffffa00c7c68>] kv_free.isra.5+0x68/0x70 [spl]
[22728.156018]  [<ffffffffa00c95d5>] spl_slab_reclaim+0x335/0x440 [spl]
[22728.156075]  [<ffffffffa00c97ed>] spl_kmem_cache_reap_now+0x10d/0x230 [spl]
[22728.156133]  [<ffffffffa00c997f>] __spl_kmem_cache_generic_shrinker.isra.6+0x6f/0xc0 [spl]
[22728.156213]  [<ffffffffa00c99e2>] spl_kmem_cache_generic_shrinker+0x12/0x20 [spl]
[22728.156279]  [<ffffffff8113a654>] shrink_slab+0x154/0x300
[22728.156321]  [<ffffffff811800f8>] ? mem_cgroup_iter+0xe8/0x200
[22728.156367]  [<ffffffff8113d764>] balance_pgdat+0x5a4/0x720
[22728.156410]  [<ffffffff8113da03>] kswapd+0x123/0x240
[22728.156450]  [<ffffffff8113d8e0>] ? balance_pgdat+0x720/0x720
[22728.156492]  [<ffffffff81077e93>] kthread+0x93/0xa0
[22728.156532]  [<ffffffff816a9724>] kernel_thread_helper+0x4/0x10
[22728.156574]  [<ffffffff81077e00>] ? flush_kthread_worker+0xb0/0xb0
[22728.156616]  [<ffffffff816a9720>] ? gs_change+0x13/0x13
[22728.156652] Code: 44 24 18 10 49 8b 7c 24 20 74 19 e8 64 01 00 00 4c 89 e7 e8 1c 63 01 00 48 83 c4 08 5b 41 5c 41 5d 5d c3 90 e8 0b 63 01 00 eb e5 <0f> 0b 48 89 d9 48 c7 c2 d8 95 a2 81 be bf 05 00 00 48 c7 c7 50 
[22728.156992] RIP  [<ffffffff8115f657>] __vunmap.part.16+0x97/0xc0
[22728.157036]  RSP <ffff8802100cfb10>
[22728.157501] ---[ end trace f781e1bbd345e8c6 ]---
[22963.400568] bad magic number for tty struct (5:2) in tty_poll
[22963.400695] bad magic number for tty struct (5:2) in tty_poll
[22963.615843] bad magic number for tty struct (5:2) in tty_poll
[22963.616323] bad magic number for tty struct (5:2) in tty_poll
[22966.838729] bad magic number for tty struct (5:2) in tty_poll
[22966.838833] bad magic number for tty struct (5:2) in tty_poll
[22966.841988] bad magic number for tty struct (5:2) in tty_poll
[22966.842091] bad magic number for tty struct (5:2) in tty_poll
[22966.842187] bad magic number for tty struct (5:2) in tty_poll
[22966.842282] bad magic number for tty struct (5:2) in tty_poll
[22966.842376] bad magic number for tty struct (5:2) in tty_poll
[22966.842470] bad magic number for tty struct (5:2) in tty_poll
[22966.842565] bad magic number for tty struct (5:2) in tty_poll
[22966.842659] bad magic number for tty struct (5:2) in tty_poll
[22966.842754] bad magic number for tty struct (5:2) in tty_poll
[22966.842848] bad magic number for tty struct (5:2) in tty_poll
[22966.842942] bad magic number for tty struct (5:2) in tty_poll
[22966.843037] bad magic number for tty struct (5:2) in tty_poll
[22966.843158] bad magic number for tty struct (5:2) in tty_poll
[22966.843254] bad magic number for tty struct (5:2) in tty_poll

System specs

HP Microserver N40L
AMD Turion(tm) II Neo N40L Dual-Core Processor, 8GB ECC RAM
Ubuntu 12.04.4 LTS - kernel 3.5.0-46-generic #70~precise1
zfs 0.6.2-1~precise

stracktraces, zdb, slab etc. via
http://temp.senhaerens.be/zfs

Please advice how to proceed?

@SenH SenH changed the title zfs send hangs zfs send hangs (kernel panic BUG) May 16, 2014
@behlendorf behlendorf added this to the 0.7.0 milestone May 17, 2014
@behlendorf behlendorf added the Bug label May 17, 2014
@behlendorf
Copy link
Contributor

@SenH Thanks for including all the debugging. There's not much more to do other than reboot and re-import the pool. If you're able to reliably reproduce this please let us know.

@SenH
Copy link
Contributor Author

SenH commented May 19, 2014

I tried rebooting and sending the datasets again (sadly there is no zfs -R resume) but the transfer was really slow (8MB/sec) and ZFS "breathing" a lot. txg_sync showed 100% wait in iotop. I just resilvered the raidz1 pool which went a lot faster and my pool is now healthy again.

@SenH SenH closed this as completed May 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants