Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restic prune wipes repo if snapshot listing fails #4612

Closed
shibijm opened this issue Jan 4, 2024 · 5 comments · Fixed by #4618
Closed

restic prune wipes repo if snapshot listing fails #4612

shibijm opened this issue Jan 4, 2024 · 5 comments · Fixed by #4618

Comments

@shibijm
Copy link

shibijm commented Jan 4, 2024

Output of restic version

restic 0.16.0 compiled with go1.20.6 on windows/amd64

I'm aware that this is two releases behind the latest one (0.16.2) but there doesn't appear to be any fixes related to this issue in the newer releases.

What backend/service did you use to store the repository?

Storj (through rclone)

Problem description / Steps to reproduce

I ran restic prune, snapshot listing failed, maybe because of a connection issue, but restic assumed that there were 0 snapshots and started deleting everything from the repository. I stopped this with Ctrl+C as soon as I noticed it.

Full Command: restic prune --max-repack-size 1G --max-unused 0
Environment Variables: RESTIC_REPOSITORY=rclone::storj,access_grant=redacted:restic, RESTIC_PASSWORD=redacted
Command Output:

repository 3f54a049 opened (version 2, compression level auto)
loading indexes...
loading all snapshots...
rclone: 2024/01/04 12:26:01 ERROR : snapshots: list failed: uplink: metainfo: metaclient: context canceled *errs.errorT
finding data that is still in use for 0 snapshots
[0:00]          0 snapshots
searching used packs...
collecting packs for deletion and repacking
[4:19] 100.00%  4762 / 4762 packs processed

to repack:             0 blobs / 0 B
this removes:          0 blobs / 0 B
to delete:        639278 blobs / 128.446 GiB
total prune:      639278 blobs / 128.446 GiB
remaining:             0 blobs / 0 B
unused size after prune: 0 B ( of remaining size)

rebuilding index
[0:00]          0 packs processed
deleting obsolete index files
[0:01] 100.00%  5 / 5 files deleted
removing 4762 old packs
signal interrupt received, cleaning up
^C
unable to remove <data/1246d2c7d1> from the repository
unable to remove <data/b5a212665e> from the repository
unable to remove <data/ca721f99c9> from the repository
unable to remove <data/fcf3086d01> from the repository
rclone: 2024/01/04 12:30:27 ERROR : data/12/1246d2c7d1c099fedac0e191e27635b9e4fd86a19cac8065bd06333fedb78fa0: Delete request remove error: uplink: metaclient: context canceled
rclone: 2024/01/04 12:30:27 ERROR : data/fc/fcf3086d01780ae947c446f8cedcb11246e89dd026c74cd47854edae0faa4b5f: Delete request remove error: uplink: metaclient: context canceled
rclone: 2024/01/04 12:30:27 ERROR : data/69/69d2da41216f3a00e04453cf2469561279f2be24d05aecf5053b53ea3ad3b332: Delete request remove error: uplink: metaclient: context canceled
rclone: 2024/01/04 12:30:27 ERROR : data/ca/ca721f99c998dbd7fe4ad9f0c596b0e9bc9660ed949431a73cc885fa476898fb: Delete request remove error: uplink: metaclient: context canceled
rclone: 2024/01/04 12:30:27 ERROR : data/b5/b5a212665eb60637e9e2a5c58ca41d55626ebc1807aa043f1ff10c25dfcd92e5: Delete request remove error: uplink: metaclient: context canceled
unable to remove <data/69d2da4121> from the repository
[0:04] 1.70%  81 / 4762 files deleted
done

I ran the same command again after this, the output of which was as under, as expected:

repository 3f54a049 opened (version 2, compression level auto)
loading indexes...
loading all snapshots...
finding data that is still in use for 4135 snapshots
[0:00] 0.05%  2 / 4135 snapshots
id 85d917f0f883aa70b265bac1489072c6e752c5db7908f6acb9b94ba32b1e875e not found in repository
github.com/restic/restic/internal/repository.(*Repository).LoadBlob
        /restic/internal/repository/repository.go:272
github.com/restic/restic/internal/restic.LoadTree
        /restic/internal/restic/tree.go:115
github.com/restic/restic/internal/restic.loadTreeWorker
        /restic/internal/restic/tree_stream.go:36
github.com/restic/restic/internal/restic.StreamTrees.func1
        /restic/internal/restic/tree_stream.go:176
golang.org/x/sync/errgroup.(*Group).Go.func1
        /home/build/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598

Expected behavior

Restic should exit with exit code 1 if snapshot listing fails during a prune operation.

Actual behavior

Restic deletes all data from the repository if snapshot listing fails during a prune operation.

Do you have any idea what may have caused this?

The snapshot listing probably failed because of a connection issue, but restic should have stopped after that.

Did restic help you today? Did it make you happy in any way?

Other than this issue I just ran into, restic has worked great for me. I have been using it on multiple systems to perform daily backups.

@shibijm
Copy link
Author

shibijm commented Jan 4, 2024

Upon further investigation, I have discovered the following:

Rclone's restic server's object list method responds with an inappropriate 404 if it encounters an error other than ErrorDirNotFound (snippet, as per commits rclone/rclone@d073efd and rclone/rclone@f832433).

This is also inconsistent with restic/rest-server's behaviour where only ErrNotExist results in a 404 (snippet 1, snippet 2).

On restic's end, 404 means that the directory doesn't exist and it isn't considered as an error, as per commit 307aeb6:

if resp.StatusCode == http.StatusNotFound {
// ignore missing directories
return nil
}

Rclone encounters a connectivity issue and responds with a 404, which indicates that there are no snapshots, so restic prunes everything.

I believe that this has to be fixed in rclone then, by responding with 500 instead of 404 in such cases.

@MichaelEischer
Copy link
Member

Oh, that's not good. I've opened rclone/rclone#7550 to fix the error handling in rclone. The PR contains an additional change to prepare for #4520 (which we might have to back out to give the rclone fix time to propagate).

@MichaelEischer
Copy link
Member

Btw, before restic 0.16.0, the rest backend did not ignore 404 errors while listing a directory. The change was introduced in #4400 .

@MichaelEischer
Copy link
Member

You could give the steps from https://restic.readthedocs.io/en/stable/077_troubleshooting.html a try to see whether there's anything left in the repository that can be salvaged.

@shibijm
Copy link
Author

shibijm commented Jan 7, 2024

You could give the steps from https://restic.readthedocs.io/en/stable/077_troubleshooting.html a try to see whether there's anything left in the repository that can be salvaged.

I did do that after the repo broke and managed to salvage everything other than some parts of the deleted data that wasn't healable due to the source files being changed, so some files in older snapshots will be corrupted, but other than that, the repository is fully operational without any errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants