Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zpool import fails #1784

Closed
ghost opened this issue Oct 12, 2013 · 7 comments
Closed

Zpool import fails #1784

ghost opened this issue Oct 12, 2013 · 7 comments

Comments

@ghost
Copy link

ghost commented Oct 12, 2013

I recently deleted about 1TB of files in my pool.
After the system finished its hdd-io i rebooted the system.
Since then im not able anymore to mount my pool.

System:
CPU: Intel Xeon E3-1265L v2
Mainboard: Intel S1200KPR
Storage-Controller: IBM M1015 (Flashed to LSI 9211-8 it)
RAM: 2x8gb ECC
OS: Ubuntu 12.10.3

ZFS Info:
ZFS-Version: Version: 0.6.2-1~precise
4 WD-RED in Raid-Z

The following happens when i try to import the pool:

Oct 12 19:27:25 henrik-server kernel: [ 1747.859353] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Oct 12 19:27:25 henrik-server kernel: [ 1747.859412] IP: [] prepare_to_wait+0x74/0x90
Oct 12 19:27:25 henrik-server kernel: [ 1747.859453] PGD 0
Oct 12 19:27:25 henrik-server kernel: [ 1747.859469] Oops: 0002 [#1] SMP
Oct 12 19:27:25 henrik-server kernel: [ 1747.859494] Modules linked in: ip6table_filter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) nf_conntrack(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptable_filter(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) gpio_ich(F) hid_generic(F) psmouse(F) usbhid(F) serio_raw(F) hid(F) microcode(F) i915(F) drm_kms_helper(F) drm(F) i2c_algo_bit(F) video(F) mei(F) lpc_ich(F) mac_hid(F) w83627ehf(F) hwmon_vid(F) coretemp(F) lp(F) parport(F) nls_iso8859_1(F) zfs(POF) zcommon(POF) znvpair(POF) zavl(POF) zunicode(POF) spl(OF) zlib_deflate(F) ahci(F) libahci(F) mpt2sas(F) scsi_transport_sas(F) raid_class(F) e1000e(F)
Oct 12 19:27:25 henrik-server kernel: [ 1747.860031] CPU 6
Oct 12 19:27:25 henrik-server kernel: [ 1747.860047] Pid: 3137, comm: txg_sync Tainted: PF O 3.8.0-31-generic #46~precise1-Ubuntu /S1200KP
Oct 12 19:27:25 henrik-server kernel: [ 1747.860107] RIP: 0010:[] [] prepare_to_wait+0x74/0x90
Oct 12 19:27:25 henrik-server kernel: [ 1747.860155] RSP: 0018:ffff8803f2859ae8 EFLAGS: 00010046
Oct 12 19:27:25 henrik-server kernel: [ 1747.860185] RAX: 0000000000000282 RBX: ffff8803ade73f80 RCX: 0000000000000000
Oct 12 19:27:25 henrik-server kernel: [ 1747.860223] RDX: ffff8803f2859b58 RSI: 0000000000000282 RDI: ffff8803ade73f80
Oct 12 19:27:25 henrik-server kernel: [ 1747.860261] RBP: ffff8803f2859b18 R08: 0000000000016d00 R09: ffffea001013d100
Oct 12 19:27:25 henrik-server kernel: [ 1747.860298] R10: ffffffffa00858cb R11: ffffc9003cef918e R12: ffff8803f2859b40
Oct 12 19:27:25 henrik-server kernel: [ 1747.860336] R13: 0000000000000002 R14: ffff8803f3f52e80 R15: 0000000000000001
Oct 12 19:27:25 henrik-server kernel: [ 1747.860375] FS: 0000000000000000(0000) GS:ffff88041f380000(0000) knlGS:0000000000000000
Oct 12 19:27:25 henrik-server kernel: [ 1747.860418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 12 19:27:25 henrik-server kernel: [ 1747.860449] CR2: 0000000000000008 CR3: 0000000001c0d000 CR4: 00000000001407e0
Oct 12 19:27:25 henrik-server kernel: [ 1747.860487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 12 19:27:25 henrik-server kernel: [ 1747.860525] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 12 19:27:25 henrik-server kernel: [ 1747.860563] Process txg_sync (pid: 3137, threadinfo ffff8803f2858000, task ffff8803f3f52e80)
Oct 12 19:27:25 henrik-server kernel: [ 1747.860607] Stack:
Oct 12 19:27:25 henrik-server kernel: [ 1747.860620] 00000000000001c0 ffff8803ade73e00 ffff8803f2859bb8 ffff8803ade73f60
Oct 12 19:27:25 henrik-server kernel: [ 1747.860668] ffff8803f2859b58 ffff8803ade73f80 ffff8803f2859b98 ffffffffa0091551
Oct 12 19:27:25 henrik-server kernel: [ 1747.860716] ffff8803ade73f28 0000000000020000 ffff880404d1a000 0000000000000000
Oct 12 19:27:25 henrik-server kernel: [ 1747.860764] Call Trace:
Oct 12 19:27:25 henrik-server kernel: [ 1747.860794] [] __cv_destroy+0xd1/0x190 [spl]
Oct 12 19:27:25 henrik-server kernel: [ 1747.860829] [] ? add_wait_queue+0x60/0x60
Oct 12 19:27:25 henrik-server kernel: [ 1747.860886] [] ddt_free+0x3a/0x50 [zfs]
Oct 12 19:27:25 henrik-server kernel: [ 1747.860938] [] ddt_sync+0x278/0x6b0 [zfs]
Oct 12 19:27:25 henrik-server kernel: [ 1747.860972] [] ? __wake_up+0x53/0x70
Oct 12 19:27:25 henrik-server kernel: [ 1747.861033] [] spa_sync+0x46e/0xae0 [zfs]
Oct 12 19:27:25 henrik-server kernel: [ 1747.861067] [] ? ktime_get_ts+0x4c/0xe0
Oct 12 19:27:25 henrik-server kernel: [ 1747.861130] [] txg_sync_thread+0x2df/0x540 [zfs]
Oct 12 19:27:25 henrik-server kernel: [ 1747.861195] [] ? txg_init+0x250/0x250 [zfs]
Oct 12 19:27:25 henrik-server kernel: [ 1747.861235] [] thread_generic_wrapper+0x78/0x90 [spl]
Oct 12 19:27:25 henrik-server kernel: [ 1747.861279] [] ? __thread_create+0x310/0x310 [spl]
Oct 12 19:27:25 henrik-server kernel: [ 1747.861316] [] kthread+0xc0/0xd0
Oct 12 19:27:25 henrik-server kernel: [ 1747.861345] [] ? flush_kthread_worker+0xb0/0xb0
Oct 12 19:27:25 henrik-server kernel: [ 1747.861382] [] ret_from_fork+0x7c/0xb0
Oct 12 19:27:25 henrik-server kernel: [ 1747.861413] [] ? flush_kthread_worker+0xb0/0xb0
Oct 12 19:27:25 henrik-server kernel: [ 1747.861446] Code: 00 48 87 11 48 89 df 48 89 c6 48 89 55 d8 48 8b 55 d8 e8 60 41 67 00 48 8b 5d e8 4c 8b 65 f0 4c 8b 6d f8 c9 c3 66 90 48 8b 4b 08 <48> 89 51 08 49 89 4c 24 18 48 8d 4b 08 49 89 4c 24 20 48 89 53
Oct 12 19:27:25 henrik-server kernel: [ 1747.861707] RIP [] prepare_to_wait+0x74/0x90
Oct 12 19:27:25 henrik-server kernel: [ 1747.861744] RSP
Oct 12 19:27:25 henrik-server kernel: [ 1747.861763] CR2: 0000000000000008
Oct 12 19:27:25 henrik-server kernel: [ 1748.230774] ---[ end trace 0695373f7596f76b ]---

@dweeezil
Copy link
Contributor

@supergerdmeier It sounds like the file system's unlinked set might have a lot of entries in it. Obviously the crash you're seeing shouldn't be happening but it would be interesting to know if there's a lot of pending deletes. You should be able to do something like zdb -dddd pool/filesys 3 to see the list of pending objects to delete. The unlinked set is typically stored in object 3 but it if it's not, you can omit the "3" and look for the object with a type of "ZFS delete queue". If there's a lot of work to do, it'll look like:

# zdb -dddd tank/a 3 | head -20
Dataset tank/a [ZPL], ID 40, cr_txg 42, 940K, 955 objects, rootbp DVA[0]=<0:62fdc00:200> DVA[1]=<0:3c13cc00:200> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=96L/96P fill=955 cksum=123eb4b172:6c177d758a1:14dfe9b909caf:2ca179ef3cbf6a

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         3    1    16K  43.0K  9.00K  43.0K  100.00  ZFS delete queue
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 0
    microzap: 44032 bytes, 680 entries

        4e0 = 1248 
        56d = 1389 
        51c = 1308 
        538 = 1336 
        435 = 1077 
        41d = 1053
... lots more entries

Finding out whether your file system has a large unlinked set would be a good starting point. BTW, I'd expect its unlinked set to be large if most/all of the files you deleted had xattrs (unless you're using xattr=sa).

@ghost
Copy link
Author

ghost commented Oct 13, 2013

I executed zdb -dddd storage and there was a huge output.
But it stuck at a object about DDT.
I had dedup enabled for a few weeks but then i disabled it. Seems the Dedup-Table is far to big.
I created a new filesystem and made a snapshot of my old one. Currently im copying the data from the snapshot to the new filesystem, Maybe i get rid of the dedup-tables this way.
Im Using Send and Receive due to the fact that i cant mount my old filesystem

@dweeezil
Copy link
Contributor

I had not considered to ask about DDT. It sounds like you're on the right track now.

@ghost
Copy link
Author

ghost commented Oct 13, 2013

Since i completed the Send/Receive i cant import both filesystems without problems... Strange.
But thank you very much for your support :)

zfsonlinux rules!

@dweeezil
Copy link
Contributor

@supergerdmeier If you have your old dedupped filesystem around, could you please run zdb -DDD storage on it. That will give us an idea of how large the dedup tables are. Also, I'd still be interested to know whether you see lots of entries when dumping the unlinked set with zdb -dddd storage/<filesystem> 3 (replace "3" with the object number of the unlinked set as determined if necessary by examing the MOS).

@ghost
Copy link
Author

ghost commented Oct 15, 2013

zdb -dddd storage
http://pastebin.com/8hHgMac5

zdb -DDD storage
http://pastebin.com/sQR4DfCY

I already deleted files of my old pool (forgot this issue here).
He stucks at the end... and i wont wait until he finish this, sorry.
My first (old) filesystem was the root one (not a good idea, now i know)

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 29, 2014
@behlendorf
Copy link
Contributor

This has gotten a bit stale but there appears to be a legitimate dedup bug here. I'm leaving it open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants