Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify pack files before upload #4529

Closed
MichaelEischer opened this issue Oct 21, 2023 · 6 comments · Fixed by #4681
Closed

Verify pack files before upload #4529

MichaelEischer opened this issue Oct 21, 2023 · 6 comments · Fixed by #4681
Labels
category: resilience preventing and recovering from repository problems type: feature suggestion suggesting a new feature

Comments

@MichaelEischer
Copy link
Member

Output of restic version

0.16.0

What should restic do differently? Which functionality do you think we should add?

restic should fully decode and verify all blobs in a pack file before uploading to make sure that no data corruption has happened.
This will ensure that the encryption and compression did not damage the blobs.

What are you trying to do? What problem would this solve?

Would have detected the data corruption in #4523 during the backup.
A lot of the data corruption issues listed in https://forum.restic.net/t/study-of-repository-damage-types/5202 would likely also be preventable by this check.

@MichaelEischer MichaelEischer added type: feature suggestion suggesting a new feature category: resilience preventing and recovering from repository problems labels Oct 21, 2023
@luca-molinaro
Copy link

This would be great. If the corruption is detected before the upload what action should restic take in your opinion ? Skip some files and warn the user ? Entirely abort the backup ?

The second option could prevent people from performing the backup in cases like the issue #4523, but is probably safer. In the first option it would be useful to print all the files that were not backed up, this would have allowed me to immediately check if those files were important to me or not without waiting for restic check --read-data, and if the files where important i could have backed them up without restic while waiting for a fix.

@MichaelEischer
Copy link
Member Author

I'm currently leaning towards calling panic and thereby aborting the backup. The check should only trigger in case of a serious bug in restic or hardware issues. Both can corrupt the internal state beyond repair. Thus, its safer to fail completely.

It might be possible to print which files restic is currently processing, although I'm not sure how hard it would be to implement that.

@luca-molinaro
Copy link

If there is a risk of damaging the entire repository i agree, it's better to abort. Thank you.

@MichaelEischer
Copy link
Member Author

While discussing with @rawtaz, we've arrived at a few interesting points.

  • Verifying all pack files likely has a significant performance impact. My estimate would be about 50% more CPU for the default compression level (lower with maximum compression). Whether this slows down the backup largely depends on how fast the network connection is. Thus, it's probably necessary to add a --no-verify-pack option or similar, which skips the pre-upload verification.
  • There's no easy way for restic to detect which files would be affected by a corrupted pack file.
  • The two main ways to handle corrupted pack files during uploads is to either continue but warn users about the problem or to immediately abort the backup.
  • Upload-but-warn would create a snapshot that is still partially useful (essentially all not corrupted parts are accessible). However, repairing such a repository requires quite a bit of manual work, although it would be possible to extend check to automatically remove corrupted data from a repository. In either case, it is necessary to check all data in a repository to properly repair the damaged.
  • Immediately failing before uploading corrupt data, does not upload corrupted data and therefore requires no steps to repair it. In most cases starting the backup again can fix spurious problem. Systematic problems would prevent backups from working, but --no-verify-pack could temporarily skip the verification at the cost of having to repair the corrupt data later on.
  • For prune the only reasonable option is to fail immediately.
  • That is, the tradeoff here is between always creating backups, but allowing some data corruption to sneak in or only creating a backup if the data integrity has been verified successfully.

As I've just noticed, it would be possible to verify a new blob immediately in Repository.SaveBlob and afterwards before an upload only check the pack header for consistency. That would provide better error reporting. (in theory, it would be possible to retry encoding a blob, but that is a very risky endeavor as some hidden data corruption might have occurred.)

@Dieterbe
Copy link

Dieterbe commented Dec 6, 2023

whatever the outcome here, as a user i appreciate your level of attention and care. thank you :)

@MichaelEischer
Copy link
Member Author

While debugging #4677 and #4523, it turned out that we need a simple way to determine the problematic file. As such we need good error reporting, such that #4681 implements the alternative that I've sketched in my previous comment. #4686 will extend the error reporting to include the required information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: resilience preventing and recovering from repository problems type: feature suggestion suggesting a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants