Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumos 3875 panic after failed rollback #1609

Closed
edillmann opened this issue Jul 24, 2013 · 9 comments
Closed

Illumos 3875 panic after failed rollback #1609

edillmann opened this issue Jul 24, 2013 · 9 comments
Milestone

Comments

@edillmann
Copy link
Contributor

Hi,

I get several kernel oops while running a zpool scrub.
The system is responsive.

Regards,
Eric

[171052.370029] init: zabbix-agent main process ended, respawning
[171065.689697] BUG: unable to handle kernel NULL pointer dereference at           (null)
[171065.689985] IP: [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.690175] PGD 323c59067 PUD 15c264067 PMD 0 
[171065.690414] Oops: 0000 [#802] SMP 
[171065.690601] Modules linked in: ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) veth(F) lru_cache(F) libcrc32c(F) ipmi_devintf(F) xt_state(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) parport_pc(F) ppdev(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) lockd(F) sunrpc(F) fscache(F) dm_crypt(F) bridge(F) stp(F) llc(F) gpio_ich(F) adm1021(F) i2c_i801(F) dm_multipath(F) scsi_dh(F) microcode(F) coretemp(F) joydev(F) lpc_ich(F) ioatdma(F) i7core_edac(F) edac_core(F) dca(F) ipmi_si(F) ipmi_msghandler(F) lp(F) parport(F) kvm_intel(F) kvm(F) ext2(F) zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate(F) raid10(F) raid456(F) async_memcpy(F) async_raid6_recov(F) async_pq(F) async_xor(F) xor(F) async_tx(F) raid6_pq(F) raid0(F) multipath(F) linear(F) hid_generic(F) usbhid(F) hid(F) raid1(F) ahci(F) libahci(F) e1000e(F) ptp(F) pps_core(F)
[171065.695639] CPU 1 
[171065.695700] Pid: 6092, comm: zabbix_agentd Tainted: PF     D    O 3.9.2-lxc2 #2 Intel Corporation S5500BC/S5500BC
[171065.695974] RIP: 0010:[<ffffffffa01cc8b6>]  [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.696181] RSP: 0018:ffff880365411e10  EFLAGS: 00010246
[171065.696276] RAX: 0000000000000000 RBX: ffff880365411ef8 RCX: ffff880365411e30
[171065.696453] RDX: ffff880365411e28 RSI: ffff880365411e20 RDI: 0000000000000000
[171065.696632] RBP: ffff880365411e58 R08: ffff880365411e38 R09: ffffffff811a18a2
[171065.696758] R10: 0000000000000000 R11: ffff9c96939d8a8f R12: ffff880856560000
[171065.696884] R13: ffff880856560378 R14: ffff880365411ef8 R15: 00007fff55d3a8f0
[171065.697011] FS:  00007f559c99d740(0000) GS:ffff88086fc00000(0000) knlGS:0000000000000000
[171065.697140] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171065.697236] CR2: 0000000000000000 CR3: 0000000323c58000 CR4: 00000000000007e0
[171065.697362] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[171065.697488] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[171065.697615] Process zabbix_agentd (pid: 6092, threadinfo ffff880365410000, task ffff8803c340dcc0)
[171065.697808] Stack:
[171065.697893]  ffff880365411e58 ffffffffa02441ad 0000000000000000 00007f559b60a5a4
[171065.698217]  ffff880365411f58 ffff8801450cc600 ffff880365411ef8 ffff880855aa4c00
[171065.698599]  00007fff55d39850 ffff880365411e68 ffffffffa0261cee ffff880365411e88
[171065.698922] Call Trace:
[171065.699051]  [<ffffffffa02441ad>] ? zfs_statvfs+0x9d/0x170 [zfs]
[171065.699184]  [<ffffffffa0261cee>] zpl_statfs+0xe/0x20 [zfs]
[171065.699284]  [<ffffffff811c73c1>] statfs_by_dentry+0xa1/0x140
[171065.699382]  [<ffffffff811c747b>] vfs_statfs+0x1b/0xb0
[171065.699477]  [<ffffffff811c7556>] user_statfs+0x46/0x90
[171065.699573]  [<ffffffff811c762a>] sys_statfs+0x1a/0x40
[171065.699709]  [<ffffffff816c8f5d>] system_call_fastpath+0x1a/0x1f
[171065.699805] Code: 00 00 00 00 00 66 66 66 66 90 55 48 8b 3f 48 89 e5 e8 6f 08 01 00 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 <48> 8b 3f 48 89 e5 e8 6f 12 01 00 5d c3 66 66 66 66 2e 0f 1f 84 
[171065.703240] RIP  [<ffffffffa01cc8b6>] dmu_objset_space+0x6/0x20 [zfs]
[171065.703410]  RSP <ffff880365411e10>
[171065.703498] CR2: 0000000000000000
@edillmann
Copy link
Contributor Author

The scrub has ended, but the oopses are still there :-(

@dweeezil
Copy link
Contributor

@edillmann It would be interesting to know what argument is being passed to statfs by zabbix_agent and how it relates to your ZFS configuration. A cursory glance at its source code make me think the argument corresponds fairly directly to your zabbix configuration. It would seem you ought to be able to duplicate this problem simply by running df on the same argument.

@edillmann
Copy link
Contributor Author

Doing a strace on df -h permit to identify a dataset which is a target to regular zfs receive. This dataset was mounted (which was wrong). I did umount the dataset and the problem disapeared.

@behlendorf
Copy link
Contributor

@edillmann Still you shouldn't have been able to cause a BUG. Can you clearly describe the incorrect configuration which was able to cause this problem?

@edillmann
Copy link
Contributor Author

@behlendorf the BUG appears in the following situation

  • on the receiving side of a zfs incremental send the dataset is mounted
  • while the receive is going on , doing a df on the mountpoint triggers the BUG

@behlendorf
Copy link
Contributor

@edillmann In zfs_statvfs() the variable zsb->z_os is NULL because we're doing an online receive and encountered an error during rollback. That's causing your crash. For the moment don't do that.

It appears the Illumos folks just fixed a variant of this exact bug under issue https://www.illumos.org/issues/3875. We'll want to port and verify this fix illumos/illumos-gate@91948b51.

@ryao
Copy link
Contributor

ryao commented Oct 11, 2013

#1775 includes Illumos 3875.

@behlendorf
Copy link
Contributor

The illumos fix in https://www.illumos.org/issues/3875 has been merged as commit 831baf0. That is expected to resolve this issue.

@edillmann
Copy link
Contributor Author

thank's a lot, i confirm that the issue is resolved :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants