-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VERIFY3 PANIC during zfs receive #9741
Comments
We have used rollback incidentally in the past, but I don't believe that it has been run on this system since the VMs and pools were built. On another system I use zfs ZVOLs to provide backing for a WSE2016 VM. In that case I use both zfs send/recv over a WAN and I have had a few opportunities to fix WSE2016 blunders via zfs rollback. I should have noted that ZFS (and especially ZFS on Linux) is an awesome product and I am so grateful for the excellent work that you all have done for the community. THANKS SO MUCH! |
We have reports about a similar/the same issue (at https://bugzilla.proxmox.com/show_bug.cgi?id=2546 (contains a stacktrace of the panic) observed both with ZFS 0.8.2 on kernel 5.0 (based on Ubuntu disco) and ZFS 0.8.3 on kernel 5.3 (eoan) The issue does not happen deterministically but it recurs (I could not yet get a small reproducer). Glad to provide more information if needed, and also to test ideas for reproducers and/or potential fixes. |
I am able to reproduce.
The migration between the hosts doesn't always trigger the Failure, but I noticed that I get a syslog message before my later initiated by "Schedule now" button never finishes. The symptom is always the same "zfs recv -F" will hang with the "STAT D" and will not finish even after days. Output
syslog message got revealed on active terminal, this is before "zfs recv" will hang on next run:
Same type incedent, but another time, its always on the receiving Host:
Possible failing hardware? |
This seems to be a prevalent issue on 0.8.3. Have seen it with zfs recv -sF, no usage of rollback in my environment |
This happened to us last month so we did a full apt upgrade but whatever it is, it's still present in 0.8.4. As others have noted, it always occurs on the receiving side of a zfs send operation, the receive process hangs in 'D' state and only a reboot will get things moving again. We do occasionally use rollback in our environments -- but I don't think any were performed in the last month.
|
Does it also occure when you delete the target first (NOT the Volumes, actualy there can be files for them even if its normaly volumes)? |
We only use ZFS filesystems (ie. no volumes / block devices) and it has only happened on incremental zfs send operations. Those operations always include the -F on the receiving side to undo any local changes that might have happened since the last snapshot. I wasn't thinking of that in the initial report so there is technically a rollback with every receive though it would just be for attributes like atime that might have been automatically updated as tape backup software worked through the filesystem. |
Just have a look if there is something which shouldn't be there, I remember having this once. |
We also had this occur. We have a primary server (mirrored) that we backup every 3 hours using syncoid to a second machine(raidz3). The secondary server that does nothing but serve as the backup show syncoid and the receive process are hung with a crash in dmesg. Details: |
We experienced the same Issue, replicating with sanoid/syncoid (both 2.0.3) between 2 Debian hosts (Stretch and Buster) Stretch host (srv1 panicing here): Target host: We last replicated from Stretch host to the Buster host Monday evening, then started taking snapshots at the Buster host which we then later replicated back (new snapshot that only existed on Buster host) to the Stretch host. Before replicating, we rolled-back (zfs rollback -R ....) the Stretch host to the latest snapshot (the one on Monday evening) While Replicating we got kernel panics:
|
Also happens randomly with zfs 2.0.3 (from debian buster-backports) replicating a local ssd based pool to an local hdd based pool with rollbacks:
|
I was able to create a reproducer for this. The reason for the panic is that when you have a zfs recv or rollback, the code path zfs_suspend_fs -> zfsvfs_teardown is taken. In this path, we asynchronously release inodes via iput. If this async releasing happens to be slow, then inodes will still be extant while the zfs recv proceeds with processing the unlinked set. In this unlinked set processing, each znode is reconstituted so it can be freed (zfs_zget). If the znode / inode is present in the unlinked set AND still hanging around from zfsvfs_teardown -> iput, we can get this hash collision and panic. |
A code change is needed for the reproducer:
|
And the reproducer script:
|
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Fixes: openzfs#9741 Fixes: openzfs#11223 Fixes: openzfs#11648 Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Fixes: openzfs#9741 Fixes: openzfs#11223 Fixes: openzfs#11648 Signed-off-by: Paul Zuchowski <pzuchowski@datto.com>
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes openzfs#9741 Closes openzfs#11223 Closes openzfs#11648 Closes openzfs#12210
In zfs_znode_alloc we always hash inodes. If the znode is unlinked, we do not need to hash it. This fixes the problem where zfs_suspend_fs is doing zrele (iput) in an async fashion, and zfs_resume_fs unlinked drain processing will try to hash an inode that could still be hashed, resulting in a panic. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alan Somers <asomers@gmail.com> Signed-off-by: Paul Zuchowski <pzuchowski@datto.com> Closes #9741 Closes #11223 Closes #11648 Closes #12210
This might be related to issue #2335
System information
Describe the problem you're observing
We have two VMs, each running on a separate OpenSuSE 42.3 KVM/libvirt hypervisor on enterprise-class servers. There are two raidz3 zpools on the first VM and one raidz3 pool on the second. Vdevs from all are supplied by iSCSI targets (each an 8TB or 10TB enterprise drive) over 10G ethernet (10 Vdevs/pool).
The first VM is running two zfs incremental send processes on the first zpool, started 4 hours apart. One is piped to a local zfs recv process to "mirror" the first zpool to the second. The second zfs send process is piped over ssh (on a 1G link) to a zfs recv process on the other, remote VM to make a second "mirror" of the first pool. We've successfully used this process on multiple systems for a couple of years now without incident. We recently upgraded zfs from 0.7.13 to 0.8.1. In this system, the pools have on the order of 191 file systems.
Over the weekend, the first VM reported a
VERIFY3(insert_inode_locked(ip) == 0) failed (-16 == 0)
followed by aPANIC at zfs_znode.c:611:zfs_znode_alloc()
. The second VM reported the same error 6 hours later. The backtraces are 100% identical on both VMs. We found that the zfs recv processes remained present on both VMs and the first zfs send process was also present. Both recv process are blocked on taskq_wait_id, [z_unlinked_drai] is blocked on spl_panic, and zfs send is blocked on pipe_wait.Cron-scheduled scrubs ran after these PANICs on the following Sunday night on all three zpools and reported no errors while the zfs send/recv process were still present.
We use zfs-auto-snapshot to manage periodic snapshots. We increased the number of snapshots to retain (--keep) by a factor of 2 to 12 two days before this panic. For example, we increased the hourly snapshots from 24 to 48, the daily from 31 to 90, and the 15 minute snapshots from 8 to 96. The pool now has on the order of 173 snapshots per filesystem for pool total of 50000. The first pool is 72.5T in size and currently only has 9.45T allocated. The other two pools are 91TB in size. lz4 compression is use, no dedup.
Describe how to reproduce the problem
So far this is a single incident. Our other systems have not encountered this problem although they have identical configurations with the exception that the --keep values on zfs-auto-snapshot have not been changed.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: