Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic on zpool create #1647

Closed
geoffdavis opened this issue Aug 12, 2013 · 2 comments
Closed

Kernel panic on zpool create #1647

geoffdavis opened this issue Aug 12, 2013 · 2 comments

Comments

@geoffdavis
Copy link

I get a consistent kernel panic on a CentOS 6.4 system when I run zpool create. I'm running ZOL 0.6.1 and the latest kernel patch version for CentOS. This system is a guest running under VMware ESXi 5.1, and has two SAN LUNs passed through to the VMware guest.

One of the LUNs is 500 GB in size. The zpool create command worked fine for that LUN under 0.6.0 and everything seems to have run fine for the last few months.

I recently decided to add a larger LUN to the box - this time 20 TB in size.

The LUN shows up in the OS as /dev/sdc or as /dev/disk/by-id/wwn-0x6000d310001b4e00000000000000005b

Command to create the pool:

zpool create -m /export/CEUSN -o version=28 CEUSN sdc

I also see creates when I leave out the -o version=28 blurb
Either path causes the system to panic with the following:

<6> sdc: sdc1 sdc9
<1>BUG: unable to handle kernel paging request at 0000000000008076
<1>IP: [<ffffffffa01acfde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4>PGD 0
<4>Oops: 0002 [#1] SMP
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:02.0/host3/target3:0:1/3:0:1:0/block/sdc/dev
<4>CPU 0
<4>Modules linked in: nfs fscache nfsd nfs_acl auth_rpcgss exportfs autofs4 lockd sunrpc ipt_REJECT xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack xt_comment ip6table_filter ip6_tables ipv6 vsock(U) zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ppdev parport_pc parport e1000 vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 2842, comm: vdev_open/0 Tainted: P           ---------------    2.6.32-358.14.1.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
<4>RIP: 0010:[<ffffffffa01acfde>]  [<ffffffffa01acfde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4>RSP: 0018:ffff8804381c7b60  EFLAGS: 00010246
<4>RAX: 0000000000008076 RBX: 0000000000000016 RCX: 0000000000000015
<4>RDX: 00000000001fffff RSI: 0000000000000230 RDI: 0000000000000016
<4>RBP: ffff8804381c7c70 R08: ffff880424808e20 R09: 0000000000002000
<4>R10: ffff8804381c7b70 R11: ffff8804386580a8 R12: ffff8804385b5000
<4>R13: 0000000000200000 R14: 0000000000000230 R15: ffff8804381be040
<4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000008076 CR3: 0000000001a85000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process vdev_open/0 (pid: 2842, threadinfo ffff8804381c6000, task ffff8804381be040)
<4>Stack:
<4> ffff880438658040 ffff880438658090 00000000381be040 ffff8804381c7bc8
<4><d> ffff880435c68060 ffff880435c680a0 ffff880435c68041 ffff880435c680b0
<4><d> ffff880435c68040 ffff880424808d30 ffff8804381c7bd0 ffff880424808d30
<4>Call Trace:
<4> [<ffffffffa032062f>] ? zio_add_child+0xef/0x110 [zfs]
<4> [<ffffffffa01b19b4>] ? taskq_init_ent+0x34/0x80 [spl]
<4> [<ffffffff8150f26e>] ? mutex_lock+0x1e/0x50
<4> [<ffffffffa031e243>] ? zio_wait_for_children+0x63/0x80 [zfs]
<4> [<ffffffffa031fad3>] zio_buf_alloc+0x23/0x30 [zfs]
<4> [<ffffffffa031fca4>] zio_vdev_io_start+0x144/0x2e0 [zfs]
<4> [<ffffffffa0320703>] zio_nowait+0xb3/0x150 [zfs]
<4> [<ffffffffa02dde0a>] vdev_probe+0x12a/0x210 [zfs]
<4> [<ffffffffa02deed0>] ? vdev_probe_done+0x0/0x250 [zfs]
<4> [<ffffffffa02f9e55>] ? zfs_post_state_change+0x15/0x20 [zfs]
<4> [<ffffffffa02de192>] vdev_open+0x2a2/0x450 [zfs]
<4> [<ffffffffa02deeb6>] vdev_open_child+0x26/0x40 [zfs]
<4> [<ffffffffa01b16e8>] taskq_thread+0x218/0x4b0 [spl]
<4> [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
<4> [<ffffffff81063330>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa01b14d0>] ? taskq_thread+0x0/0x4b0 [spl]
<4> [<ffffffff81096956>] kthread+0x96/0xa0
<4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
<4> [<ffffffff810968c0>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
<4>Code: 00 f6 05 0d 35 01 00 01 48 89 fb 41 89 f6 74 0d f6 05 f7 34 01 00 08 0f 85 70 01 00 00 48 8d 83 60 80 00 00 48 89 85 70 ff ff ff <f0> ff 83 60 80 00 00 9c 58 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f
<1>RIP  [<ffffffffa01acfde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4> RSP <ffff8804381c7b60>
<4>CR2: 0000000000008076
@geoffdavis
Copy link
Author

For giggles, I tried this with a 2TB LUN rather than a 20 TB LUN, just to see if it was a LUN sizing issue. No dice.

root@anfnfsl ~ # sudo zpool create -m /export/test2TB test2TB /dev/disk/by-id/wwn-0x6000d310001b4e00000000000000005c
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/wwn-0x6000d310001b4e00000000000000005c does not contain an EFI label but it may contain partition
information in the MBR.
root@anfnfsl ~ # sudo zpool create -fm /export/test2TB test2TB /dev/disk/by-id/wwn-0x6000d310001b4e00000000000000005c

This yields a similar crash dump:

<6> sdd: sdd1 sdd9
<1>BUG: unable to handle kernel paging request at 0000000000008076
<1>IP: [<ffffffffa0194fde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4>PGD 434088067 PUD 434aca067 PMD 0
<4>Oops: 0002 [#1] SMP
<4>last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:02.0/host3/target3:0:2/3:0:2:0/block/sdd/dev
<4>CPU 0
<4>Modules linked in: nfs fscache nfsd nfs_acl auth_rpcgss exportfs autofs4 lockd sunrpc ipt_REJECT xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack xt_comment ip6table_filter ip6_tables ipv6 vsock(U) ppdev parport_pc parport zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate e1000 vmware_balloon sg vmci(U) i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 3461, comm: vdev_open/0 Tainted: P           ---------------    2.6.32-358.14.1.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
<4>RIP: 0010:[<ffffffffa0194fde>]  [<ffffffffa0194fde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4>RSP: 0018:ffff8804381b1b60  EFLAGS: 00010246
<4>RAX: 0000000000008076 RBX: 0000000000000016 RCX: 0000000000000015
<4>RDX: 00000000001fffff RSI: 0000000000000230 RDI: 0000000000000016
<4>RBP: ffff8804381b1c70 R08: ffff8804217a34f0 R09: 0000000000002000
<4>R10: ffff8804381b1b70 R11: ffff8804387680a8 R12: ffff8804340a7000
<4>R13: 0000000000200000 R14: 0000000000000230 R15: ffff8804381af540
<4>FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 0000000000008076 CR3: 0000000437f9c000 CR4: 00000000000007f0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process vdev_open/0 (pid: 3461, threadinfo ffff8804381b0000, task ffff8804381af540)
<4>Stack:
<4> ffff880438768040 ffff880438768090 00000000381af540 ffff8804381b1bc8
<4><d> ffff880438758060 ffff8804387580a0 ffff880438758041 ffff8804387580b0
<4><d> ffff880438758040 ffff8804217a3400 ffff8804381b1bd0 ffff8804217a3400
<4>Call Trace:
<4> [<ffffffffa030862f>] ? zio_add_child+0xef/0x110 [zfs]
<4> [<ffffffffa01999b4>] ? taskq_init_ent+0x34/0x80 [spl]
<4> [<ffffffff8150f26e>] ? mutex_lock+0x1e/0x50
<4> [<ffffffffa0306243>] ? zio_wait_for_children+0x63/0x80 [zfs]
<4> [<ffffffffa0307ad3>] zio_buf_alloc+0x23/0x30 [zfs]
<4> [<ffffffffa0307ca4>] zio_vdev_io_start+0x144/0x2e0 [zfs]
<4> [<ffffffffa0308703>] zio_nowait+0xb3/0x150 [zfs]
<4> [<ffffffffa02c5e0a>] vdev_probe+0x12a/0x210 [zfs]
<4> [<ffffffffa02c6ed0>] ? vdev_probe_done+0x0/0x250 [zfs]
<4> [<ffffffffa02e1e55>] ? zfs_post_state_change+0x15/0x20 [zfs]
<4> [<ffffffffa02c6192>] vdev_open+0x2a2/0x450 [zfs]
<4> [<ffffffffa02c6eb6>] vdev_open_child+0x26/0x40 [zfs]
<4> [<ffffffffa01996e8>] taskq_thread+0x218/0x4b0 [spl]
<4> [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
<4> [<ffffffff81063330>] ? default_wake_function+0x0/0x20
<4> [<ffffffffa01994d0>] ? taskq_thread+0x0/0x4b0 [spl]
<4> [<ffffffff81096956>] kthread+0x96/0xa0
<4> [<ffffffff8100c0ca>] child_rip+0xa/0x20
<4> [<ffffffff810968c0>] ? kthread+0x0/0xa0
<4> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
<4>Code: 00 f6 05 0d 35 01 00 01 48 89 fb 41 89 f6 74 0d f6 05 f7 34 01 00 08 0f 85 70 01 00 00 48 8d 83 60 80 00 00 48 89 85 70 ff ff ff <f0> ff 83 60 80 00 00 9c 58 0f 1f 44 00 00 49 89 c7 fa 66 0f 1f
<1>RIP  [<ffffffffa0194fde>] spl_kmem_cache_alloc+0x4e/0xf90 [spl]
<4> RSP <ffff8804381b1b60>
<4>CR2: 0000000000008076

@geoffdavis
Copy link
Author

I also tested with a 500GB LUN using VMware virtual device pass-through (which is how the single working LUN on the system is configured), and I get the same crash.

When I create a VMware virtual disk rather than trying to do a raw device pass-through, no such crashes occur. I was only able to create a 32 GB test volume for comparison due to available disk space on my VMFS volumes, so it's significantly smaller than the 500GB, 2TB, and 20TB volumes used in my other tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants