Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load or import zpool - arc_buf_remove_ref NULL pointer dereference #4055

Closed
liquidhorse opened this issue Nov 30, 2015 · 3 comments
Closed

Comments

@liquidhorse
Copy link

Hi! One of my ZFS pools started acting strangely the other day, and after a reboot I was no longer able to bring the pool back online. I have updated to the latest stable release of ZOL and even tried a different kernel (original was in the 3.13 series, new kernel is 4.2.0-18). After moving the drives to a different system, I was able to reproduce the issue and capture the BUG:

Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255318] BUG: unable to handle kernel NULL pointer dereference at           (null)
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255357] IP: [<ffffffffa083b34b>] arc_buf_remove_ref+0x1b/0x150 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255424] PGD 0 
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255434] Oops: 0000 [#1] SMP 
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255449] Modules linked in: zfs(POF) zunicode(POF) zcommon(POF) znvpair(POF) spl(OF) zavl(POF) usb_storage ctr ccm rfcomm bnep bluetooth gpio_ich dell_wmi sparse_keymap arc4 dell_laptop dcdbas binfmt_misc dm_multipath scsi_dh rtl8192cu coretemp rtl_usb rtlwifi kvm_intel rtl8192c_common pcmcia kvm b43 bcma joydev yenta_socket serio_raw mac80211 pcmcia_rsrc r852 sm_common nand nand_ecc nand_bch r592 bch memstick nand_ids lpc_ich mtd cfg80211 pcmcia_core mac_hid snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore parport_pc ppdev lp parport raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log nouveau mxm_wmi i2c_algo_bit ttm drm_kms_helper drm psmouse firewire_ohci tg3 ahci sdhci_pci firewire_core sdhci libahci ptp crc_itu_t pps_core ssb video wmi
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255847] CPU: 0 PID: 9028 Comm: txg_sync Tainted: PF          O 3.13.0-24-generic #46-Ubuntu
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255862] Hardware name: Dell Inc. Precision M6400                 /      , BIOS A13 06/05/2013
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255876] task: ffff8801f585c7d0 ti: ffff8801888be000 task.ti: ffff8801888be000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255888] RIP: 0010:[<ffffffffa083b34b>]  [<ffffffffa083b34b>] arc_buf_remove_ref+0x1b/0x150 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255939] RSP: 0018:ffff8801888bf2d8  EFLAGS: 00010246
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255951] RAX: 0000000000000000 RBX: ffff8801888bf448 RCX: 0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255963] RDX: ffffffffffffffff RSI: ffff8801888bf448 RDI: 0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255974] RBP: ffff8801888bf300 R08: 0000000000000000 R09: 0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255987] R10: ffffffffa07889a7 R11: ffffffffa078875a R12: ffff880006c76900
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.255999] R13: 0000000000000000 R14: ffffffffa083b480 R15: 0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256013] FS:  0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256030] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256041] CR2: 0000000000000000 CR3: 0000000001c0e000 CR4: 00000000000407f0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256054] Stack:
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256060]  ffff8801888bf448 ffff880006c76900 0000000000000000 ffffffffa083b480
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256084]  0000000000000000 ffff8801888bf318 ffffffffa083b4b3 0000000000000034
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256105]  ffff8801888bf3b0 ffffffffa083b94c 0000000000000001 ffff8801888bf358
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256126] Call Trace:
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256171]  [<ffffffffa083b480>] ? arc_buf_remove_ref+0x150/0x150 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256217]  [<ffffffffa083b4b3>] arc_getbuf_func+0x33/0x70 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256263]  [<ffffffffa083b94c>] arc_read+0x45c/0xa80 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256279]  [<ffffffff810cc6ce>] ? getrawmonotonic+0x2e/0xb0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256324]  [<ffffffffa083b480>] ? arc_buf_remove_ref+0x150/0x150 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256390]  [<ffffffffa08777a6>] dsl_scan_visitbp+0x266/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256454]  [<ffffffffa0877c3d>] dsl_scan_visitbp+0x6fd/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256517]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256581]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256643]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa08778c1>] dsl_scan_visitbp+0x381/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa083b968>] ? arc_read+0x478/0xa80 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8119fbbc>] ? __kmalloc_node+0x5c/0x200
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa0877dbb>] dsl_scan_visitbp+0x87b/0xbf0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa0878203>] dsl_scan_visitds+0xd3/0x4b0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa0879688>] dsl_scan_sync+0x2a8/0xc50 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa08e49ad>] ? zio_destroy+0xcd/0xd0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa08e825a>] ? zio_wait+0x16a/0x210 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa088d312>] spa_sync+0x3c2/0xb20 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff810aaea2>] ? autoremove_wake_function+0x12/0x40
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa089f639>] txg_sync_thread+0x3b9/0x620 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8109df24>] ? arch_vtime_task_switch+0x94/0xa0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa089f280>] ? txg_quiesce_thread+0x3f0/0x3f0 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa078ae81>] thread_generic_wrapper+0x71/0x80 [spl]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffffa078ae10>] ? __thread_exit+0x20/0x20 [spl]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8108b312>] kthread+0xd2/0xf0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8172637c>] ret_from_fork+0x7c/0xb0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680] Code: 3c cc ff ff e9 47 ff ff ff 0f 1f 80 00 00 00 00 66 66 66 66 90 55 31 c0 48 c7 c2 ff ff ff ff 48 89 e5 41 57 41 56 41 55 41 54 53 <4c> 8b 07 48 89 fb 4d 8b 48 30 49 8b 78 10 0f 1f 80 00 00 00 00 
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680] RIP  [<ffffffffa083b34b>] arc_buf_remove_ref+0x1b/0x150 [zfs]
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680]  RSP <ffff8801888bf2d8>
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680] CR2: 0000000000000000
Nov 30 18:10:34 ecs-ltm6400-14 kernel: [523087.256680] ---[ end trace 0f1ef4beb4e1bd26 ]---

Right now the server running this pool is hard-down... is there anything I can do/provide to assist with determining the root cause of this issue?

I have attached the BUG dump as well:
zfs-arc-fail.txt

Thanks!

@liquidhorse
Copy link
Author

By the way, this happens while attempting to import the zpool. I've tried a variety of import options (including -FX and -N) but everything ends with the bug. All the devices appear to be intact. The pool is two mirrors in stripe, so I even tried pulling drives out of the system and importing the degraded array.

@behlendorf
Copy link
Contributor

@liquidhorse please try the patch in pull request #4080. It should resolve the panic and allow you to import the pool with -F.

@liquidhorse
Copy link
Author

@behlendorf Looks like that fixed it! I was able to import the pool successfully and it appears to be intact. I am performing a full scrub on it now.

THANK YOU SO VERY MUCH!!!!!! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants