Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assert VERIFY(0 == sa_buf_hold(zfsvfs->z_os, obj, NULL, &db)) fails on zfs_znode.c:766:zfs_mknode() #6385

Closed
tuomari opened this issue Jul 21, 2017 · 7 comments

Comments

@tuomari
Copy link

tuomari commented Jul 21, 2017

System information

Type Version/Name
Distribution Name Debian
Distribution Version 8.7
Linux Kernel 4.9.36
Architecture amd64
ZFS Version 0.7.0-rc4_103_ge19572e4c
SPL Version 0.7.0-rc5

Describe the problem you're observing

After only a few days of uptime, I received a kernel PANIC. The system is the same as in #6220

Describe how to reproduce the problem

If reproduces, will report back.

Include any warning/errors/backtraces from the system logs

Jul 21 13:20:40 helvi kernel: [ 1095.346261] VERIFY(0 == sa_buf_hold(zfsvfs->z_os, obj, NULL, &db)) failed
Jul 21 13:20:40 helvi kernel: [ 1095.346291] PANIC at zfs_znode.c:766:zfs_mknode()
Jul 21 13:20:40 helvi kernel: [ 1095.346309] Showing stack for process 23046
Jul 21 13:20:40 helvi kernel: [ 1095.346311] CPU: 3 PID: 23046 Comm: zma Tainted: P           O    4.9.36.iudex.kvm.ovs.1 #2
Jul 21 13:20:40 helvi kernel: [ 1095.346312] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
Jul 21 13:20:40 helvi kernel: [ 1095.346314]  ffffc9003361b8f0 ffffffff8132765b ffffffffa13697ad 00000000000002fe
Jul 21 13:20:40 helvi kernel: [ 1095.346317]  ffffc9003361b900 ffffffffa03beb2f ffffc9003361ba80 ffffffffa03bebe7
Jul 21 13:20:40 helvi kernel: [ 1095.346319]  2222222222222222 2222222200000028 ffffc9003361ba90 ffffc9003361ba30
Jul 21 13:20:40 helvi kernel: [ 1095.346321] Call Trace:
Jul 21 13:20:40 helvi kernel: [ 1095.346327]  [<ffffffff8132765b>] dump_stack+0x4d/0x72
Jul 21 13:20:40 helvi kernel: [ 1095.346343]  [<ffffffffa03beb2f>] spl_dumpstack+0x3f/0x50 [spl]
Jul 21 13:20:40 helvi kernel: [ 1095.346348]  [<ffffffffa03bebe7>] spl_panic+0xa7/0xd0 [spl]
Jul 21 13:20:40 helvi kernel: [ 1095.346433]  [<ffffffffa11a3526>] ? dbuf_rele+0x36/0x70 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346470]  [<ffffffffa11cb7e7>] ? dnode_hold_impl+0x477/0x9e0 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346507]  [<ffffffffa11cbf89>] ? dnode_rele+0x39/0x80 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346545]  [<ffffffffa11cbd66>] ? dnode_hold+0x16/0x20 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346579]  [<ffffffffa11b1823>] ? dmu_bonus_hold+0x23/0x220 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346624]  [<ffffffffa127274c>] zfs_mknode+0xe1c/0xe60 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346627]  [<ffffffff816dcfb9>] ? _raw_spin_unlock+0x9/0x10
Jul 21 13:20:40 helvi kernel: [ 1095.346671]  [<ffffffffa121eb0b>] ? txg_rele_to_quiesce+0x3b/0x70 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346708]  [<ffffffffa11c76fb>] ? dmu_tx_assign+0x45b/0x660 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346755]  [<ffffffffa1265a58>] zfs_create+0x498/0x760 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346760]  [<ffffffff810e4c2f>] ? __wake_up+0x3f/0x50
Jul 21 13:20:40 helvi kernel: [ 1095.346827]  [<ffffffffa128ff9d>] zpl_create+0xbd/0x210 [zfs]
Jul 21 13:20:40 helvi kernel: [ 1095.346831]  [<ffffffff811f7737>] path_openat+0x1437/0x1560
Jul 21 13:20:40 helvi kernel: [ 1095.346834]  [<ffffffff811f8959>] do_filp_open+0x79/0xd0
Jul 21 13:20:40 helvi kernel: [ 1095.346839]  [<ffffffff811cb4ce>] ? kmem_cache_alloc+0x14e/0x1a0
Jul 21 13:20:40 helvi kernel: [ 1095.346841]  [<ffffffff816dcfb9>] ? _raw_spin_unlock+0x9/0x10
Jul 21 13:20:40 helvi kernel: [ 1095.346844]  [<ffffffff812065c9>] ? __alloc_fd+0xa9/0x160
Jul 21 13:20:40 helvi kernel: [ 1095.346848]  [<ffffffff811e6c21>] do_sys_open+0x111/0x1f0
Jul 21 13:20:40 helvi kernel: [ 1095.346850]  [<ffffffff811e6d19>] SyS_open+0x19/0x20
Jul 21 13:20:40 helvi kernel: [ 1095.346852]  [<ffffffff816dd260>] entry_SYSCALL_64_fastpath+0x13/0x94
@tuomari
Copy link
Author

tuomari commented Jul 21, 2017

I have a script running which forcefully reboots the system if PANIC-string is found in dmesg. It seems that I have been hitting this trace hourly for the last 10 hours. The stack traces are identical on every crash.

I have currently a scrub running on ~50%, I receive snapshots to every night and have a zoneminder trashing the filesystem constantly. The system is not in any ciritcal use, so if I can provide any help on tracking this down, I will gladly do.

@tuomari
Copy link
Author

tuomari commented Jul 21, 2017

I updated zfs to 0.7.0-rc5_1_g4265a9293 and I'm still hitting it every few hours.

@tuomari tuomari changed the title kernel panic on sa_buf_hold Assert VERIFY(0 == sa_buf_hold(zfsvfs->z_os, obj, NULL, &db)) fails on zfs_znode.c:766:zfs_mknode() Jul 21, 2017
@dweeezil
Copy link
Contributor

@tuomari Are you able to run zpool events and/or zpool events -v after this happens? If so, does it show anything interesting?

@tuomari
Copy link
Author

tuomari commented Jul 22, 2017

@dweeezil I can run zpool events -v but nothing intresting was found. Only events were related to snapshot creation and deletion, and they were ~30min earlier.

@tuomari
Copy link
Author

tuomari commented Jul 23, 2017

The system is still crashing every few hours with the previous stacktrace, but I noticed one trace which is a bit different:

Jul 23 17:50:00 helvi kernel: [ 7750.141474] VERIFY(0 == dmu_buf_hold(os, obj, 0, FTAG, &db, DMU_READ_NO_PREFETCH)) failed
Jul 23 17:50:00 helvi kernel: [ 7750.141509] PANIC at zap_micro.c:678:mzap_create_impl()
Jul 23 17:50:00 helvi kernel: [ 7750.141529] Showing stack for process 5914
Jul 23 17:50:00 helvi kernel: [ 7750.141532] CPU: 8 PID: 5914 Comm: zma Tainted: P           O    4.9.36.iudex.kvm.ovs.1 #2
Jul 23 17:50:00 helvi kernel: [ 7750.141533] Hardware name: System manufacturer System Product Name/Z8NA-D6(C), BIOS 1303    05/10/2012
Jul 23 17:50:00 helvi kernel: [ 7750.141535]  ffffc90055637a18 ffffffff8132765b ffffffffa1376f45 00000000000002a6
Jul 23 17:50:00 helvi kernel: [ 7750.141538]  ffffc90055637a28 ffffffffa014ab2f ffffc90055637ba8 ffffffffa014abe7
Jul 23 17:50:00 helvi kernel: [ 7750.141541]  ffff8802b7a3bbd8 ffffffff00000028 ffffc90055637bb8 ffffc90055637b58
Jul 23 17:50:00 helvi kernel: [ 7750.141544] Call Trace:
Jul 23 17:50:00 helvi kernel: [ 7750.141551]  [<ffffffff8132765b>] dump_stack+0x4d/0x72
Jul 23 17:50:00 helvi kernel: [ 7750.141569]  [<ffffffffa014ab2f>] spl_dumpstack+0x3f/0x50 [spl]
Jul 23 17:50:00 helvi kernel: [ 7750.141576]  [<ffffffffa014abe7>] spl_panic+0xa7/0xd0 [spl]
Jul 23 17:50:00 helvi kernel: [ 7750.141581]  [<ffffffff816da669>] ? __mutex_unlock_slowpath+0xa9/0x140
Jul 23 17:50:00 helvi kernel: [ 7750.141685]  [<ffffffffa11dfd66>] ? dnode_hold+0x16/0x20 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.141745]  [<ffffffffa11c8906>] ? dmu_buf_hold_noread+0x26/0x100 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.141809]  [<ffffffffa125a2f3>] mzap_create_impl+0x1d3/0x210 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.141876]  [<ffffffffa125a41c>] zap_create_norm_dnsize+0x3c/0x50 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.141946]  [<ffffffffa1288b30>] zfs_mknode+0xd50/0xe60 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.141950]  [<ffffffff816dcfb9>] ? _raw_spin_unlock+0x9/0x10
Jul 23 17:50:00 helvi kernel: [ 7750.142013]  [<ffffffffa1234eab>] ? txg_rele_to_quiesce+0x3b/0x70 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.142073]  [<ffffffffa11db6fb>] ? dmu_tx_assign+0x45b/0x660 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.142141]  [<ffffffffa127b940>] zfs_mkdir+0x4b0/0x5e0 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.142211]  [<ffffffffa12a6052>] zpl_mkdir+0xb2/0x210 [zfs]
Jul 23 17:50:00 helvi kernel: [ 7750.142216]  [<ffffffff811f461b>] vfs_mkdir+0xeb/0x1a0
Jul 23 17:50:00 helvi kernel: [ 7750.142219]  [<ffffffff811f904d>] SyS_mkdir+0xad/0xe0
Jul 23 17:50:00 helvi kernel: [ 7750.142221]  [<ffffffff816dd260>] entry_SYSCALL_64_fastpath+0x13/0x94

@tuomari
Copy link
Author

tuomari commented Aug 8, 2017

My scrub finally finished, and it seems that it has no effect on this bug. I still hit this at least few times a day.

@tuomari
Copy link
Author

tuomari commented Aug 12, 2017

It seems that after updating to 0.7.0-16_g 6a8ee4f the problem went away. I suspect 9631681 fixed this.

@tuomari tuomari closed this as completed Aug 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants