-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel Oops when rsyncing #1822
Comments
It's likely this is related to the shrinker changes you needed to make. We'll certainly get this sorted once we add 3.12 support. |
Thanks for the fast reply. It didn't think it's related to my change since I just reverted the commit where the compat shrinker interface was removed torvalds/linux@a0b0213, hence I'm using the shrinker API of 3.11. Could you tell from the call trace that it has to do with the shrinker? |
@rfehren If that was the only change then perhaps this is unrelated. The stack just happens to be down a similar path which can be reached via the shrinker. Although in this case I see it was by arc_adapt. |
I switched to our old 2.6.32 kernel (same spl/zfs version) on this machine. Things are running stable now. So it's either a general problem with 3.12 or indeed related to my shrinker change. Will test again once you've ported to the new shrinker ABI. |
@rfehren Official 3.12 support has been merged. Can you verify you're not able to reproduce this issue using the latest code from master and a 3.12 kernel. |
Yup. Works without a crash now (stock kernel 3.12 with old shrinker API removed). Thanks a lot for the fix. There still seems to be a problem though with the size of the arc_meta cache: It sucks up all available memory way beyond the arc_meta_limit: $ cat /proc/spl/kstat/zfs/arcstats |grep c_ In case it makes any difference: This was with compression=lz4 on the filesystem being rsynced. Otherwise no special settings. |
torvalds/linux@24f7c6 introduced a new shrinker API while torvalds/linux@a0b021 dropped support for the old shrinker API. This patch adds support for the new shrinker API by wrapping the old one with the new one. This change also reorganizes the autotools checks on the shrinker API such that the configure script will fail early if an unknown API is encountered in the future. Support for the set_shrinker() API which was used by Linux 2.6.22 and older has been dropped. As a general rule compatibility is only maintained back to Linux 2.6.26. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#1732 Closes openzfs/zfs#1822 Closes #293 Closes #307
torvalds/linux@24f7c6 introduced a new shrinker API while torvalds/linux@a0b021 dropped support for the old shrinker API. This patch adds support for the new shrinker API by wrapping the old one with the new one. This change also reorganizes the autotools checks on the shrinker API such that the configure script will fail early if an unknown API is encountered in the future. Support for the set_shrinker() API which was used by Linux 2.6.22 and older has been dropped. As a general rule compatibility is only maintained back to Linux 2.6.26. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#1732 Closes openzfs/zfs#1822 Closes #293 Closes #307
torvalds/linux@24f7c6 introduced a new shrinker API while torvalds/linux@a0b021 dropped support for the old shrinker API. This patch adds support for the new shrinker API by wrapping the old one with the new one. This change also reorganizes the autotools checks on the shrinker API such that the configure script will fail early if an unknown API is encountered in the future. Support for the set_shrinker() API which was used by Linux 2.6.22 and older has been dropped. As a general rule compatibility is only maintained back to Linux 2.6.26. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#1732 Closes openzfs/zfs#1822 Closes openzfs#293 Closes openzfs#307
torvalds/linux@24f7c6 introduced a new shrinker API while torvalds/linux@a0b021 dropped support for the old shrinker API. This patch adds support for the new shrinker API by wrapping the old one with the new one. This change also reorganizes the autotools checks on the shrinker API such that the configure script will fail early if an unknown API is encountered in the future. Support for the set_shrinker() API which was used by Linux 2.6.22 and older has been dropped. As a general rule compatibility is only maintained back to Linux 2.6.26. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#1732 Closes openzfs/zfs#1822 Closes #293 Closes #307
Hi,
I have the following configuration:
options zfs zfs_arc_min=536870912
options zfs zfs_arc_max=4294967296
$ zpool list -v
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
ssd 6.94T 379G 6.57T 5% 1.00x ONLINE -
raidz1 6.94T 379G 6.57T -
scsi-SATA_Crucial_CT960M5_13250940E411 - - - -
scsi-SATA_Crucial_CT960M5_13250940E45D - - - -
scsi-SATA_Crucial_CT960M5_13250940E431 - - - -
scsi-SATA_Crucial_CT960M5_132609429A2E - - - -
scsi-SATA_Crucial_CT960M5_132609429D6B - - - -
scsi-SATA_Crucial_CT960M5_132909464FF2 - - - -
scsi-SATA_Crucial_CT960M5_13310947CE48 - - - -
scsi-SATA_Crucial_CT960M5_13260942678B
While rsyncing, after a while I get the following OOPS (it's reproducible, even though time varies):
[14587.648348] BUG: unable to handle kernel NULL pointer dereference at (null)
[14587.656885] IP: < (null)>
[14587.662285] PGD 0
[14587.664696] Oops: 0010 [#1] SMP
[14587.668373] Modules linked in: zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) nfsd exportfs ipv6 mptspi scsi_transport_spi mptscsih mptbase igb i2c_algo_bit dm_mod hid_generic usbhid x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 acpi_cpufreq lpc_ich mfd_core ahci libahci mpt2sas microcode scsi_transport_sas ehci_pci xhci_hcd ehci_hcd usbcore usb_common ixgbe mdio ipmi_si thermal ipmi_msghandler fan processor
[14587.717108] CPU: 2 PID: 2459 Comm: arc_adapt Tainted: P O 3.12.0-rc5-ql-generic-8 #1
[14587.726372] Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 1.1 07/19/2013
[14587.733756] task: ffff88007e3663c0 ti: ffff880402baa000 task.ti: ffff880402baa000
[14587.741872] RIP: 0010:[<0000000000000000>] < (null)>
[14587.750017] RSP: 0018:ffff880402babd50 EFLAGS: 00010246
[14587.755647] RAX: 0000000000000000 RBX: ffff880402ecc368 RCX: 0000000000000000
[14587.763121] RDX: ffffffffa056bd4e RSI: ffff880402babd58 RDI: ffff8804056c3b58
[14587.770748] RBP: ffff880402babd98 R08: 0000000000000000 R09: 0000000000049959
[14587.778219] R10: 0000000000000000 R11: 0140000000000000 R12: ffff880402babdb4
[14587.785683] R13: ffff880402ecc000 R14: ffff8804056c3800 R15: ffff880402babe10
[14587.793143] FS: 0000000000000000(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
[14587.801883] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14587.807962] CR2: 0000000000000000 CR3: 000000000169c000 CR4: 00000000001407e0
[14587.815423] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[14587.822932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[14587.830393] Stack:
[14587.832730] ffffffffa0541a62 00000000000000d0 0000000000000400 0000000000000000
[14587.840863] 0000000000000000 ffffffffa055f4d0 ffff8804056c3800 ffff8804056c3868
[14587.848985] 0000000000000000 ffff880402babdb8 ffffffffa055f4eb 0000000000000064
[14587.857106] Call Trace:
[14587.859908] [] ? zfs_sb_prune+0xb2/0xd0 [zfs]
[14587.866248] [] ? zpl_inode_alloc+0x70/0x70 [zfs]
[14587.872852] [] zpl_prune_sb+0x1b/0x20 [zfs]
[14587.879023] [] iterate_supers_type+0xae/0xd0
[14587.885277] [] ? zpl_prune_sb+0x20/0x20 [zfs]
[14587.891626] [] zpl_prune_sbs+0x27/0x30 [zfs]
[14587.897878] [] arc_adjust_meta+0x119/0x1e0 [zfs]
[14587.904477] [] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
[14587.911255] [] ? arc_adjust_meta+0x1e0/0x1e0 [zfs]
[14587.918035] [] arc_adapt_thread+0x5f/0x160 [zfs]
[14587.924642] [] thread_generic_wrapper+0x73/0x90 [spl]
[14587.931679] [] ? __thread_create+0x300/0x300 [spl]
[14587.938522] [] kthread+0xbb/0xc0
[14587.943729] [] ? kthread_freezable_should_stop+0x70/0x70
[14587.951021] [] ret_from_fork+0x7c/0xb0
[14587.956747] [] ? kthread_freezable_should_stop+0x70/0x70
[14587.964043] Code: Bad RIP value.
[14587.967729] RIP < (null)>
[14587.973217] RSP
[14587.977028] CR2: 0000000000000000
[14587.981248] ---[ end trace 9342161bfa5076eb ]---
After the OOPS occurs, memory is increasing until at some point the oomkiller kills system processes and the machine becomes unusable. arcstats from after the OOPS with memory pressure starting to appear:
$ cat /proc/spl/kstat/zfs/arcstats |grep c_
c_min 4 536870912
c_max 4 4294967296
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 7581447024
arc_meta_limit 4 1073741824
arc_meta_max 4 7685258800
Thanks,
Roland
The text was updated successfully, but these errors were encountered: