Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Oops in 2.0rc4: VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0) #11223

Closed
IvanVolosyuk opened this issue Nov 20, 2020 · 6 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@IvanVolosyuk
Copy link
Contributor

IvanVolosyuk commented Nov 20, 2020

Type Version/Name
Distribution Name Gentoo
Distribution Version Latest
Linux Kernel 5.4.72-gentoo .. 5.4.88-gentoo
Architecture x86_64
ZFS Version zfs-2.0.0-rc4 .. zfs-2.0.1

Describe the problem you're observing

Kernel oops

Describe how to reproduce the problem

I actually rolled back mounted filesystem as mentined in the assert.

        /*
         * The only way insert_inode_locked() can fail is if the ip->i_ino
         * number is already hashed for this super block.  This can never
         * happen because the inode numbers map 1:1 with the object numbers.
         *
         * The one exception is rolling back a mounted file system, but in
         * this case all the active inode are unhashed during the rollback.
         */
        VERIFY3S(insert_inode_locked(ip), ==, 0);

I also have a custom netdata plugin which issues ~10k pool + metadata reads every second according to netdata monitoring, see the first comment.

Include any warning/errors/backtraces from the system logs

[1209798.747433] VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
[1209798.747435] PANIC at zfs_znode.c:617:zfs_znode_alloc()
[1209798.747435] Showing stack for process 2635
[1209798.747437] CPU: 12 PID: 2635 Comm: z_unlinked_drai Not tainted 5.4.72-gentoo-x86_64 #4
[1209798.747437] Hardware name: System manufacturer System Product Name/ROG STRIX Z390-F GAMING, BIOS 1401 11/26/2019
[1209798.747438] Call Trace:
[1209798.747444]  dump_stack+0x50/0x70
[1209798.747446]  spl_panic+0xcf/0xf7
[1209798.747448]  ? sa_cache_constructor+0x1e/0x40
[1209798.747449]  ? spl_kmem_cache_alloc+0x94/0x770
[1209798.747451]  ? get_page_from_freelist+0xde4/0xfa0
[1209798.747452]  ? sa_lookup_locked+0x5f/0xb0
[1209798.747454]  ? iput+0x65/0x1a0
[1209798.747456]  zfs_znode_alloc+0x58e/0x6b0
[1209798.747458]  zfs_zget+0x1a1/0x250
[1209798.747461]  zfs_unlinked_drain_task+0x69/0xf0
[1209798.747463]  ? __switch_to_asm+0x34/0x70
[1209798.747463]  ? __switch_to_asm+0x40/0x70
[1209798.747464]  ? __switch_to_asm+0x34/0x70
[1209798.747464]  ? __switch_to_asm+0x40/0x70
[1209798.747465]  ? __switch_to_asm+0x34/0x70
[1209798.747465]  ? __switch_to_asm+0x40/0x70
[1209798.747466]  ? __switch_to_asm+0x34/0x70
[1209798.747466]  ? __switch_to_asm+0x40/0x70
[1209798.747467]  ? __switch_to_asm+0x34/0x70
[1209798.747468]  ? __switch_to_asm+0x40/0x70
[1209798.747468]  ? __switch_to_asm+0x34/0x70
[1209798.747469]  ? __switch_to_asm+0x40/0x70
[1209798.747469]  ? __switch_to_asm+0x34/0x70
[1209798.747470]  ? __switch_to_asm+0x40/0x70
[1209798.747470]  ? __switch_to_asm+0x34/0x70
[1209798.747471]  ? __schedule+0x1e0/0x520
[1209798.747473]  ? remove_wait_queue+0x1b/0x60
[1209798.747474]  taskq_thread+0x2db/0x520
[1209798.747476]  ? wake_up_q+0x60/0x60
[1209798.747477]  ? taskq_thread_spawn+0x50/0x50
[1209798.747479]  kthread+0x10d/0x130
[1209798.747480]  ? kthread_create_worker_on_cpu+0x60/0x60
[1209798.747481]  ret_from_fork+0x1f/0x40
@IvanVolosyuk IvanVolosyuk added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Nov 20, 2020
@IvanVolosyuk
Copy link
Contributor Author

IvanVolosyuk commented Nov 20, 2020

This may be cause by my custom zfs netdata plugin:
zfs-netdata.cc.txt
It might have expose some kind of race condition in ZFS as it opens a zfs handler and every second in the loop queries ZFS_PROP_USEDDS, ZFS_PROP_USEDSNAP for all datasets and ZPOOL_PROP_SIZE and ZPOOL_PROP_ALLOCATED for all pools. It causes 10k of arc cached requests to pool metadata.

@maxximino
Copy link
Contributor

This just happened to me as well, and I don't have the netdata plugin.
Running ZFS 2.0.0-1~exp1 from Debian, on kernel 5.9.0-5-amd64 #1 Debian 5.9.15-1 .
At the time of the issue, there was no particular load on the machine, but it happened ~15s after some crons started (zfs-auto-snapshot + send/recv between local pools).
No ECC errors reported.

@IvanVolosyuk
Copy link
Contributor Author

IvanVolosyuk commented Jan 16, 2021

Got another crash (ZFS 2.0.1, Linux 5.4.88-gentoo) when doing zfs rollback of live root FS with new mountpoint created and a fs mounted there. So far every rollback gave me the kernel oops.

[334706.450990] VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
[334706.450992] PANIC at zfs_znode.c:616:zfs_znode_alloc()
[334706.450992] Showing stack for process 2683
[334706.450994] CPU: 7 PID: 2683 Comm: z_unlinked_drai Not tainted 5.4.88-gentoo-x86_64 #1
[334706.450994] Hardware name: System manufacturer System Product Name/ROG STRIX Z390-F GAMING, BIOS 1401 11/26/2019
[334706.450995] Call Trace:
[334706.451001]  dump_stack+0x50/0x70
[334706.451003]  spl_panic+0xcf/0xf7
[334706.451006]  ? sa_cache_constructor+0x1e/0x40
[334706.451007]  ? spl_kmem_cache_alloc+0x96/0x770
[334706.451008]  ? __switch_to_asm+0x34/0x70
[334706.451009]  ? __switch_to_asm+0x40/0x70
[334706.451009]  ? __switch_to_asm+0x34/0x70
[334706.451010]  ? __switch_to_asm+0x40/0x70
[334706.451011]  ? sa_lookup_locked+0x5f/0xb0
[334706.451013]  ? iput+0x66/0x1a0
[334706.451015]  zfs_znode_alloc+0x589/0x6b0
[334706.451017]  zfs_zget+0x231/0x270
[334706.451020]  zfs_unlinked_drain_task+0x69/0xf0
[334706.451021]  ? __switch_to_asm+0x34/0x70
[334706.451023]  ? __switch_to_xtra+0x3d/0x610
[334706.451023]  ? __switch_to_asm+0x34/0x70
[334706.451024]  ? __switch_to_asm+0x40/0x70
[334706.451025]  ? __switch_to_asm+0x34/0x70
[334706.451025]  ? __switch_to_asm+0x40/0x70
[334706.451026]  ? __switch_to_asm+0x34/0x70
[334706.451027]  ? _raw_spin_unlock_irq+0xe/0x20
[334706.451029]  ? finish_task_switch+0x7b/0x240
[334706.451030]  ? __schedule+0x1e1/0x530
[334706.451031]  ? _raw_spin_lock_irqsave+0x14/0x40
[334706.451032]  ? remove_wait_queue+0x1b/0x60
[334706.451033]  taskq_thread+0x2db/0x520
[334706.451035]  ? wake_up_q+0x60/0x60
[334706.451036]  ? taskq_thread_spawn+0x50/0x50
[334706.451038]  kthread+0x134/0x160
[334706.451039]  ? kthread_create_worker_on_cpu+0x60/0x60
[334706.451040]  ret_from_fork+0x1f/0x40

@maxximino
Copy link
Contributor

Happened a second time.
[ 1580.653410] VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
[ 1580.653414] PANIC at zfs_znode.c:616:zfs_znode_alloc()
[ 1580.653415] Showing stack for process 2778
[ 1580.653417] CPU: 21 PID: 2778 Comm: z_unlinked_drai Tainted: P OE 5.10.0-1-amd64 #1 Debian 5.10.5-1
[ 1580.653418] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 3001 12/04/2020
[ 1580.653419] Call Trace:
[ 1580.653427] dump_stack+0x6b/0x83
[ 1580.653433] spl_panic+0xd4/0xfc [spl]
[ 1580.653473] ? sa_cache_constructor+0x23/0x40 [zfs]
[ 1580.653476] ? spl_kmem_cache_alloc+0xaf/0x7c0 [spl]
[ 1580.653479] ? spl_kmem_cache_alloc+0xaf/0x7c0 [spl]
[ 1580.653482] ? insert_inode_locked+0x1ec/0x200
[ 1580.653518] zfs_znode_alloc+0x5c2/0x6f0 [zfs]
[ 1580.653551] zfs_zget+0x1ae/0x260 [zfs]
[ 1580.653580] zfs_unlinked_drain_task+0x6e/0xf0 [zfs]
[ 1580.653584] ? psi_group_change+0x41/0x210
[ 1580.653586] ? __switch_to_asm+0x42/0x70
[ 1580.653588] ? __switch_to+0x187/0x450
[ 1580.653591] ? __schedule+0x28a/0x870
[ 1580.653593] ? remove_wait_queue+0x20/0x60
[ 1580.653597] taskq_thread+0x2da/0x520 [spl]
[ 1580.653599] ? wake_up_q+0xa0/0xa0
[ 1580.653602] ? taskq_thread_spawn+0x50/0x50 [spl]
[ 1580.653605] kthread+0x11b/0x140
[ 1580.653606] ? __kthread_bind_mask+0x60/0x60
[ 1580.653608] ret_from_fork+0x22/0x30

Additional information: The operation going on in background was a non-raw send/receive between two local pools (zfs send -v $STARTSNAPCMD ${SRC}@$ENDSNAP |zfs receive -u -F $DST).
The recv process is stuck in D state. I can't gdb or strace it, but from /proc:

cat /proc/502006/stack

[<0>] taskq_wait_id+0x8a/0xe0 [spl]
[<0>] taskq_cancel_id+0xe7/0x120 [spl]
[<0>] zfs_unlinked_drain_stop_wait+0x42/0x60 [zfs]
[<0>] zfsvfs_teardown+0x1f/0x340 [zfs]
[<0>] zfs_ioc_recv_impl+0x7fb/0xf50 [zfs]
[<0>] zfs_ioc_recv+0x174/0x310 [zfs]
[<0>] zfsdev_ioctl_common+0x697/0x870 [zfs]
[<0>] zfsdev_ioctl+0x53/0xe0 [zfs]
[<0>] __x64_sys_ioctl+0x83/0xb0
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

@pequalsmp
Copy link

Got the similar (same?) issue, when i tried to force a rollback on a dataset:

System

Linux 5.4.78-2-pve 
zfs-0.8.5-pve1
zfs-kmod-0.8.5-pve1

Log

VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
PANIC at zfs_znode.c:611:zfs_znode_alloc()
Showing stack for process 4028
CPU: 14 PID: 4028 Comm: z_unlinked_drai Tainted: P           O      5.4.78-2-pve #1
Call Trace:
 dump_stack+0x6d/0x9a
 spl_dumpstack+0x29/0x2b [spl]
 spl_panic+0xd3/0xfb [spl]
 ? sa_cache_constructor+0x27/0x50 [zfs]
 ? sa_lookup_locked+0x66/0xb0 [zfs]
 ? iput+0x7d/0x210
 ? insert_inode_locked+0x1d8/0x1e0
 zfs_znode_alloc+0x63d/0x6a0 [zfs]
 zfs_zget+0x1a7/0x250 [zfs]
 zfs_unlinked_drain_task+0x74/0x100 [zfs]
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? finish_task_switch+0xd5/0x250
 ? __schedule+0x2ee/0x6f0
 ? remove_wait_queue+0x4d/0x60
 taskq_thread+0x2ec/0x4d0 [spl]
 ? wake_up_q+0x80/0x80
 kthread+0x120/0x140
 ? task_done+0xb0/0xb0 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

Afterwards there was a hung task:

INFO: task z_unlinked_drai:4028 blocked for more than 120 seconds.
      Tainted: P           O      5.4.78-2-pve #1
z_unlinked_drai D    0  4028      2 0x80004000
Call Trace:
 __schedule+0x2e6/0x6f0
 schedule+0x33/0xa0
 spl_panic+0xf9/0xfb [spl]
 ? sa_cache_constructor+0x27/0x50 [zfs]
 ? sa_lookup_locked+0x66/0xb0 [zfs]
 ? iput+0x7d/0x210
 ? insert_inode_locked+0x1d8/0x1e0
 zfs_znode_alloc+0x63d/0x6a0 [zfs]
 zfs_zget+0x1a7/0x250 [zfs]
 zfs_unlinked_drain_task+0x74/0x100 [zfs]
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? finish_task_switch+0xd5/0x250
 ? __schedule+0x2ee/0x6f0
 ? remove_wait_queue+0x4d/0x60
 taskq_thread+0x2ec/0x4d0 [spl]
 ? wake_up_q+0x80/0x80
 kthread+0x120/0x140
 ? task_done+0xb0/0xb0 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x35/0x40

@PaulZ-98
Copy link
Contributor

PaulZ-98 commented Jun 1, 2021

Provided root cause information and reproducer in #9741

PaulZ-98 added a commit to datto/zfs that referenced this issue Jun 9, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Fixes: openzfs#9741
Fixes: openzfs#11223
Fixes: openzfs#11648

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
PaulZ-98 added a commit to datto/zfs that referenced this issue Jun 9, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Fixes: openzfs#9741
Fixes: openzfs#11223
Fixes: openzfs#11648

Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 15, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#9741
Closes openzfs#11223
Closes openzfs#11648
Closes openzfs#12210
behlendorf pushed a commit that referenced this issue Jun 16, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 16, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#9741
Closes openzfs#11223
Closes openzfs#11648
Closes openzfs#12210
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 17, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes openzfs#9741
Closes openzfs#11223
Closes openzfs#11648
Closes openzfs#12210
tonyhutter pushed a commit that referenced this issue Jun 23, 2021
In zfs_znode_alloc we always hash inodes.  If the
znode is unlinked, we do not need to hash it.  This
fixes the problem where zfs_suspend_fs is doing zrele
(iput) in an async fashion, and zfs_resume_fs unlinked
drain processing will try to hash an inode that could
still be hashed, resulting in a panic.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alan Somers <asomers@gmail.com>
Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
Closes #9741
Closes #11223
Closes #11648
Closes #12210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants