-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC: inode has invalid mode: 0x0 #11474
Comments
I did some more testing, and the file mode is only lost after a reboot. |
@gbsf I have a few questions, some of them very mundane, some of them might conceivably be dangerous to investigate:
|
I think I might have the same problem. The error message is the same, the stack trace is basically identical and the problematic file is a log file written by windows application running via wine. I am running archlinux too.
|
Hey @gbsf, is this issue reproducible? If yes, would it be possible to capture a kernel crash dump from the system? |
Hey all, I seem to be able to reproduce the issue 100%, but I can't for the life of me figure out how to get it to generate a core dump... I get the panic and everything, but the dump seems to just disappear rather than be written, if someone could help me generate it I'd be happy to upload it |
Yes, it is reproducible. I don't exactly understand what you mean by kernel crash dump, the panic backtrace is in the issue already. If you mean a memory dump from an affected system, I'm not really setup to capture those. Either way, since the panic is only a symptom of the error, I don't think it'll be very useful. At that point, the inode is already corrupted. But I managed to create a test case based on the operations Wine performs.
|
@gbsf Hey! Thanks for the reproducer! I'm using Debian 5.10.9, and I'm not able to reproduce any issue with your program. Does this work on a new pool? I.e.,
and then run it on that one? Other facts: I'm on a virtual machine, using the "cloud image". I can try with 5.10.5 relatively easily, if you're able to reproduce on a clean pool. |
I still have the file/directory in question around, yeah I can still reproduce it |
This is all the info I have from the crash, I still can't get it to write the vmcore out
|
Is there a way, just to get things working again, that I could reach in and fix the value for that inode? Also, I have xattr=sa set, not sure if that's relevant |
I narrowed it down to a file in this directory somewhere, and ran zdb on them to try to get an idea of what the results were |
Ah, think I found out how to get some info on that inode specifically |
Searching that device, there's a few occurrences of modes that don't look right
|
Also worth mentioning is this is a mirror device, and scrub does not fix this error |
At a glance, I'm wondering if the user.wine.sd field is somehow overflowing into the mode field |
It seems my test case is incomplete, because I can reproduce using Wine. I tried to avoid this because it is way more involved than a simple C program. @aerusso, here are the new STR:
|
@gbsf Thanks again---I am going to try your reproducer over the weekend on ZFS 2.0.2 and Linux 5.10.12. |
@gbsf I have "good" news: your reproducer works on a VM I have. For anyone else interested, even if you get a wine crash calling MTGA.exe, it still produces the corruption (I ran MTGA.exe twice, but I don't know if that was necessary). |
(The ZTS patch in between is obviously innocent). 3d40b65 is a big, refactoring patch @mattmacy @freqlabs @behlendorf . This is a data corruption bug, as far as I can tell. I'm double-checking my work right now. |
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes openzfs#11474
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes openzfs#11474
I think I figured it out, but I'd recommend waiting until the patch gets some review before testing it. |
How to fix or remove all bad directories/files after this patch? Release |
Pending the fix could overwriting the mode fix the error? |
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes #11474 Closes #11576
I've applied @aerusso's fix to the master branch and have added it to the staging branch for the next point release (zfs-2.0.3). For anyone who hit this issue you should be able to fix the panic by temporarily enabling the |
Without trying to sound alarmist, the (tiny) possibility of a file ending up with a semirandom mode from an uninitialized field - which might leave permissions open rather than just weird - elevates this from a stability or corruption issue to a security issue - IMHO. Just sayin' in case that warrants flushing out the next point release ASAP. |
@adamdmoss yes, I'd like to get this tagged sooner rather than latter. |
Hello,
I have raidz2 with one bad disk:
P.S. Just upgraded to 2.0.3, problem is present. |
I just hit the same issue on driver 2.0.1, so it's not just isolated to 2.0.2 and may have been in the code base longer / have other code paths that trigger a similar issue (not sure if that's what dimez means by "P.S. Just upgraded to 2.0.3, problem is present", or he just hasn't repaired the on-disk structure). My case was also a directory created under Wine (pure coincidence with OP?); there were no apparent issues with the installer process that created the directory, but after a reboot it is impossible to ls, du, or rm -rf that directory (any of the three put the process into an unkillable D+ state that never returns). I'm on Linux 5.10.6 with zfs-2.0.1 built in-kernel; the affected partition comes from a zpool mirrored across two SATA drives, neither of which is reporting any hardware errors, so I don't think it's a coincidental physical failure. Of note, I've been running that zpool for years under the 0.8.x series and never had any problems, but I just upgraded from 0.8.6 (on an older kernel) to the kernel 5.10.6 / zfs-2.0.1 combination last month. The dmesg entry for the first hang (subsequent attempts to access the directory also hang, but don't generate new kernel log entries) is [703847.050679] PANIC: inode 2834363 has invalid mode: 0x0 [703847.050685] Showing stack for process 20511 |
Same here. Also with a direct made by wine: (Screenshot from fsearch during reindexing, a generic file search program)
|
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes openzfs#11474 Closes openzfs#11576
Also via wine and lutris here. Any workaround for actually deleting the invalid files? |
#11474 (comment) works. |
3d40b65 refactored zfs_vnops.c, which shared much code verbatim between Linux and BSD. After a successful write, the suid/sgid bits are reset, and the mode to be written is stored in newmode. On Linux, this was propagated to both the in-memory inode and znode, which is then updated with sa_update. 3d40b65 accidentally removed the initialization of newmode, which happened to occur on the same line as the inode update (which has been moved out of the function). The uninitialized newmode can be saved to disk, leading to a crash on stat() of that file, in addition to a merely incorrect file mode. Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Antonio Russo <aerusso@aerusso.net> Closes openzfs#11474 Closes openzfs#11576
How did you get this report? I would like to run the same on my machine that is exhibiting very similar behaviour (on completely different files, 5.13 kernel, ubuntu 21.04 zfs-2.0.2, zfs-kmod-2.0.3). |
Here's a ruby one liner that provides roughly the same output for all files in the current working directory:
|
100444 and 100555 are valid values, a mode like 1777777777761304335442 is what indicated there was a problem To generate that report I used |
Just been hit with the same issue again... on ZFS 2.1.0, is there anything I can provide to assist in getting this fixed? |
@Retrodynen To help you with the issue you're experiencing, you'll have to explain it precisely. Better still, if you can give steps that reproduce the problem on a newly created pool, we'll be able to resolve the issue with relative ease. By the way, you should probably do this in a new bug report because the underlying issue that caused these symptoms here has already been resolved. I'll add that, if you ran either of the two point releases affected by this issue, be sure you have already manually repaired the broken mode (by first setting ZFS_RECOVER and then manually adjusting the mode). Read here for details. For safe measure, you may also want to remove all snapshots that contain files with the damaged modes (i.e., corrupt metadata), since those snapshots will also induce the panic in this bug report. There are no plans that I am aware of to automate the repair. Moreover, any such tool would be severely limited by the fact that the (metadata) information was genuinely lost. |
Hi Antonio,
Ahhh, okay. Yeah I'm 99.9% sure I had just stumbled upon another file
which was damaged when the bug was around (reinstalled a Steam game which
uses WINE), I had just assumed the repair was automated.
Sorry for the noise :)
…----------------------------------------
*From: *Antonio Russo ***@***.***>
*To: *openzfs/zfs ***@***.***>
*CC: *Retrodynen ***@***.***>; Mention ***@***.***>
*Date: *30 Jul 2021 20:48:25
*Subject: *Re: [openzfs/zfs] PANIC: inode has invalid mode: 0x0 (#11474)
@Retrodynen[https://github.com/Retrodynen] To help you with the issue
you're experiencing, you'll have to explain it precisely. Better still,
if you can give steps that reproduce the problem on a *newly created
pool*, we'll be able to resolve the issue with relative ease.
By the way, you should probably do this in a new bug report because the
underlying issue that caused these symptoms here has already been
resolved.
*I'll add that, if you ran either of the two point releases affected by
this issue, be sure you have already manually repaired the broken mode
(by first setting ZFS_RECOVER and then manually adjusting the mode).*
Read here for
details[#11474 (comment)].
For safe measure, you may also want to remove all snapshots that
contain files with the damaged modes (i.e., corrupt metadata), since
those snapshots will also induce the panic in this bug report. There
are no plans that I am aware of to automate the repair. Moreover, any
such tool would be severely limited by the fact that the (metadata)
information was genuinely lost.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on
GitHub[#11474 (comment)],
or
unsubscribe[https://github.com/notifications/unsubscribe-auth/AQ2C6QCLEQWDRVZIU6S7KRTT2L6X5ANCNFSM4WGFEJMQ].
[###24x24:true###][Tracking
image][https://github.com/notifications/beacon/AQ2C6QBHQPCQNZAIGZE4YPDT2L6X5A5CNFSM4WGFEJM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGUHBVTY.gif]
|
I have encountered this error as well, but the
Then trying to
Any suggestions for how to resolve this issue besides what I've already tried? |
@GlenPickle You seem to have already identified the Ubuntu bug that is (probably) responsible for your stack trace. That corresponds to OpenZFS #10971. |
I just tried this on an encrypted pool/dataset and zdb seems to "hide" the necessary info (yes, the keys are loaded):
Is there a flag for zdb to make it show the info as it does for unencrypted datasets? |
Uhhh, good question my system uses the drive hw encryption, sorry D: |
Is there a utility anywhere that can manually fix the ZPL_MODE field, or alternatively that will unlink a file with an invalid ZPL_MODE? As some other users have noted, even when zfs_recover is set to 1, attempts to stat(), unlink(), or chmod() an affected file will hang indefinitely. I understand that ultimately this can be done via sa_update() with the SA_ZPL_MODE() macro, but its uses in the kernel module source require a delicate setup that doesn't seem to translate easily to a userspace utility, and I can't find any uses of sa_update() in any of the utilities provided in openzfs. |
the patch fixes a potential panic on systems running ZFS > 2.0.0 and is already queued for inclusion in 2.0.3 - see [0] for a related github issue. [0] openzfs/zfs#11474 Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
System information
Describe the problem you're observing
A couple of days ago, a program (running under Wine, if it matters) either created a corrupted file or caused some system failure that made the file mode become 0.
Since those were only log files, I renamed the enclosing dir to stop the program from trying to access them.
Comparing with other log files created by the application, the correct mode for this file should be
S_IFREG|S_ISVTX|0777
.It is stored under a directory with mode
S_IFDIR|S_ISUID|S_ISGID|S_ISVTX|0777
and everything is owned by my user.The pool is backed by a single SATA SSD. I already ran a scrub but no errors were reported.
Describe how to reproduce the problem
I couldn't find a way to reproduce the creation of such corrupted files, but any attempt to access the existing ones will hang with a kernel panic.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: