-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing mechanism to fix permanent errors/delete file over all snapshots #4732
Comments
Snapshots are STRICTLY read-only. That is the entire point of them. The entire architecture of ZFS is designed around this assumption and very special code would need to be written to handle that. This might even enter Block Pointer Rewrite territory.
Pool redundancy (RAID-Z and mirrors) is supposed to provide the restoration point. One issue I have with these ideas idea is that the default checksum (fletcher4) isn't cryptographically secure making this a very risky process. As for the filesystem being consistent, ZFS writes all metadata twice (at least) by default so automatic recovery is even more likely. The filesystem is consistent, your data is not. |
Yes, but at least allow to manually "accept" the corruption as the new status quo, updating the checksums to reflect the corruption should be possible. And beside that I still think that making files deletable from snapshots is a worthwhile feature to implement. |
What the heck is happening now?
|
Agree. A report of error isn’t a fault.
The checksum really doesn’t matter because you’ll hit the birthday problem with any of
Richard.Elling@RichardElling.com |
From the data presented, it appears as though the “sd?_crypt?” devices are corrupting data.
Richard.Elling@RichardElling.com |
Well this depends on the use case. I doubt that the "entire point" is it being read-only... More like the most important property is to provide access to data from an earlier time. Making it read-only makes sense to protect it, but I don't see a reason to make it a religion if the user wants a different behaviour.
So ZFS does not consider the actual data it stores as being part of it's file system? This is new. So ZFS's mindset basicall is: "I don't care for your data, my meta data is consistent, live with it?" I highly doubt that.
This is an extremely unlikely event with sha256. With CRC or even md5 you might have been right. But with SHA256 trying a few million combinations to create a birthday paradox is nothing to sweat about. Else cryptography would have a big problem by now. The absolute minimum I would - and still do - expect is to "accept" the corrupted data and modify the checksums to reflect that to make the corrupted data accessible again for investigation. Is that not a reasonable request? Or is there anything I miss out, that I can do now that is appropriate amount of effort for a few unimportant files being corrupt on an 9TB container? |
Well, I doubt that. I admit I did not scrub regularly, so when the disk failed I guess those 40 files that are now corrupted are fully on me, as the resilvering never had a chance to recrate them from 3 disks. Else this tank did run fine for nearly 3 years now. Another thing that I now see & that is strange is, that he is still resilvering. The stats it shows are totally wrong. 3 Hours ago: now: And no matter when I do zpool status -v tank it shows its resilvering at over 100M/s. Just to be clear, when the resilvering is done, the status shows fine on all 4 devices until I reboot. *And I can absolutely live with an occasional file corruption if I know which file it is at least. My crucial core data is all additionally backed up, the rest is not so important but I can not afford a full backup. I fear scrubbing (24hours) too often will extremely shorten my consumer devices. My money is not that limitless that I can switch a drive every month. I never imagine to be in such a mess because of one file corruption, though. |
Ok, solved this at least. There was another backup running rsyncing data from the tank, slowing down resilvering considerably. Does not really explain why the status showed the wrong speed & time estimation though, maybe its a "if I am allowed to resilver (prio), this is the speed I can go". |
As has been said, some of the parts of how ZFS is designed at its core make it impractical (to put it mildly) for the contents of a snapshot to be made read/write. It might be possible to manually rewrite the checksums to be "correct", but since one of ZFS's goals is not to allow silent data corruption, I would be mildly surprised if code to permit that landed anywhere except maybe in zhack. If you're on a RAID-Z1, then losing an entire disk does mean you have no redundancy left in that RAID-Z1 until it finishes resilvering, and any errors found while resilvering are going to be uncorrectable. (This is why people often choose either higher RAID-Z levels for important data, or keep backups, or ideally both.) If you've not done any sort of periodic scrubs to detect errors of this nature, then the only time this will show up is on failure - leading to situations like this. It should let you read the contents of those files that don't fail reconstruction, but in an N-disk wide RAID-ZX, for any logical stretch of {block1 block2 ... block(N-X)}, it's going to be written out as {block1 block2 ... block(N-X) parity1 ... parityX}, and if you try to compute the (N-X) blocks from some combination of exactly enough blocks and parity data, and the checksums in the result don't match, you have no idea which disks/blocks are at fault, and so ZFS won't let you read any of those blocks. (Good luck trying to bruteforce [blocksize] * [stripesize] presuming a single bit error, let alone double-bit.) What you might have found useful (or still find useful) would have been using something like ddrescue to extract the blocks you could read from the failing disk onto a new disk, put the new disk in place of the failed disk, and see what you can recover. (Ideally, you'd have backup block-for-block copies of all the disks involved so you don't accidentally scribble over any data you want to recover.) In general, you should scrub periodically, even if you don't have any disk failures, to avoid minor failure cases turning into catastrophic ones, particularly in RAID-Zx scenarios. (You might also want to look into whether your system has some other reason for checksum errors on disks at times; having run a bunch of different ZFS configs, I don't really expect to see any CKSUM errors outside of some failing piece of hardware.) |
Thank you I understand most of what you said well and already know that. But, recalculating the checksums based on the corruption to accept the data loss should always be an option in any file system. See "Lost + Found" approach. I understand that ZFS is very proud of not letting this happen in the first place, but as you pointed out yourself in an RAIDZ1 this can very well still happen during reconstruction even when doing everything else right. So please implement a way to handle this, else I will have to take a look at brtfs in the hope it handles this in a better way. I think I saw snapshots are writable in brtfs. Also can someone please confirm, if a permanent file corruption in the tank will, by design, lead to a resilver on each reboot? Or is there something else off in my tank? PS:
errors: 35 data errors, use '-v' for a list |
is there a way to delete a single corrupted file assuming there are no snapshots? |
it would seem that the request to continue on mailing list was simply a diversion to turn the question into a "support request" (which it wasn't). it's not easy to see this as done in really good faith. |
@midjji normally you can use the @FlorianHeigl there has been recent work proposed in this area. I'd encourage you to take a look at PR #9372 and see if the functionality proposed that would meet your needs. |
See that's a really nice answer that will help every person here now and in the future. |
This is also the top google result for this question. Which by the way I
still haven't found an answer to, or like a non idiot I would have posted
here, because it is the top google result for the question.
Deleting this thread would also be acceptable.
…On Tue, 12 Nov 2019, 20:14 Florian Heigl, ***@***.***> wrote:
See that's a really nice answer that will help every person here now and
in the future.
The recent improvements for dealing with inconsistent data are very
welcomed by *many* people.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4732>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJQYJOWDZFWI7DIB6TIEJLQTL575ANCNFSM4CFZPIMQ>
.
|
Fixing the file isn't of interest, I'd just like to be able to rm it
without a kernel hang.
On Tue, 12 Nov 2019, 23:36 Mikael Persson, <mikael.p.persson@gmail.com>
wrote:
… This is also the top google result for this question. Which by the way I
still haven't found an answer to, or like a non idiot I would have posted
here, because it is the top google result for the question.
Deleting this thread would also be acceptable.
On Tue, 12 Nov 2019, 20:14 Florian Heigl, ***@***.***>
wrote:
> See that's a really nice answer that will help every person here now and
> in the future.
> The recent improvements for dealing with inconsistent data are very
> welcomed by *many* people.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4732>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABJQYJOWDZFWI7DIB6TIEJLQTL575ANCNFSM4CFZPIMQ>
> .
>
|
So thats not supposed to be the case?
…On Wed, 13 Nov 2019, 17:07 kpande, ***@***.***> wrote:
then open a new bug.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4732>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJQYJLB6JRLKRYZEUV7YFTQTQQ5ZANCNFSM4CFZPIMQ>
.
|
Situation
tank/Shared@zfs-auto-snap_monthly-2016-01-08-1240:/Games/Steam/SteamApps/common/SatelliteReign/SatelliteReignLinux_Data/sharedassets1.assets
Expected behaviour
Being offered ways to fix the errors by one or more methods like:
Actual behaviour
This might be wrong but to my research in the past days:
This state is only fixable by...
These options all seem ridiculous and totally unfit for an elsewise so well written and thought through file system solution. Every other file system, naturally, has ways to handle and correct even uncorrectable errors in the sense that the file system itself at least is consistent again. And as a second objective to restore as much of the original data as possible (could be a bit flip in a text file which could be totally unproblematic)
A solution
I would highly recommend to make it at least possible to delete a single file from all snapshots without deleting the snapshots.
This would also come in handy in other situations where you simply want to delete a file, like a virus, or if you want to free disk space by deleting a whole folder that was not supposed to be snapshotted/is not needed any more.
One could then also write a tool that opens all snapshots at once and makes a "merged file system" containing all files from all snapshots on top of each other for the purpose of cleaning the ZPOOL from unneeded files/file trees without loosing the history for all other files.
The text was updated successfully, but these errors were encountered: