Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.15.0 regression: Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/04804cba82>): invalid data returned #4143

Closed
tbm opened this issue Jan 13, 2023 · 14 comments · Fixed by #4152
Labels
state: need feedback waiting for feedback, e.g. from the submitter type: bug

Comments

@tbm
Copy link
Contributor

tbm commented Jan 13, 2023

Upgrading to 0.15.0, restic fails to connect to my repo:

restic snapshots
found 11 old cache directories in /home/tbm/.cache/restic, run `restic cache --cleanup` to remove them
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/04804cba82>): invalid data returned
Load(<lock/e820fb5d61>, 0, 0) returned error, retrying after 552.330144ms: load(<lock/e820fb5d61>): invalid data returned
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 582.280027ms: load(<lock/04804cba82>): invalid data returned
Load(<lock/e820fb5d61>, 0, 0) returned error, retrying after 468.857094ms: load(<lock/e820fb5d61>): invalid data returned
Load(<lock/04804cba82>, 0, 0) returned error, retrying after 462.318748ms: load(<lock/04804cba82>): invalid data returned
Load(<lock/e820fb5d61>, 0, 0) returned error, retrying after 593.411537ms: load(<lock/e820fb5d61>): invalid data returned
unable to create lock in backend: Fatal: context canceled

0.14.0 works fine:

./bin/restic-0.14.0 snapshots
repository 9fbc9e6f opened (repository version 1) successfully, password is correct
found 11 old cache directories in /home/tbm/.cache/restic, run `restic cache --cleanup` to remove them
ID        Time                 Host        Tags        Paths
-------------------------------------------------------------------------------------
...

The backend is SFTP.

@fd0
Copy link
Member

fd0 commented Jan 13, 2023

Oh, that's odd, thanks for the report! I'm wondering though: the first thing restic does is load the file config, and apparently that worked fine. Only afterwards it tries to load the lock files, which fails for you.

It works for me (with an OpenSSH server), what's the server software on the other end? Can you maybe do a git bisect between 0.14.0 and 0.15.0?

@fd0 fd0 added type: bug state: need feedback waiting for feedback, e.g. from the submitter labels Jan 13, 2023
@tbm
Copy link
Contributor Author

tbm commented Jan 13, 2023

The backup is on rsync.net. They appear to use:

debug1: Remote protocol version 2.0, remote software version OpenSSH_8.9-hpn14v15 FreeBSD-openssh-portable-8.9.p1_3,1
debug1: match: OpenSSH_8.9-hpn14v15 FreeBSD-openssh-portable-8.9.p1_3,1 pat OpenSSH* compat 0x04000000

I just tried to another backup using the B2 backend and that worked.

I'm not sure I can git bisect as I have no Go environment but I'll see what I can do.

@MichaelEischer
Copy link
Member

MichaelEischer commented Jan 13, 2023

That rather looks like the lock files in question are broken. Since #3569 invalid lock files are no longer ignored, and #3521 added the "invalid data returned" error. Can you delete the lock files whose name starts with 04804cba82 / e820fb5d61 manually? Or use restic unlock --remove-all but make sure before that no other restic instance uses the repository.

@tbm
Copy link
Contributor Author

tbm commented Jan 13, 2023

The lock files are empty files:

sftp> ls -la
drwxr-xr-x    2 XX   XX          8 Jan 13 01:45 .
drwxr-xr-x    7 XX   XX          8 Aug 13  2021 ..
-rw-r--r--    1 XX   XX          0 Nov 24  2021 04804cba82b48830607ec23d248f96ae09c6f3f1ee238a1d4163a2d6b4ed06d9
-rw-r--r--    1 XX   XX          0 Oct 20 13:19 c33dad22612727bc12565878c48daa4fcd3227f2271ca55e2e22f380c863817f-restic-temp-30d68247f45e9760d745fe537c7f9271
-rw-r--r--    1 XX   XX          0 Nov 29 15:13 c58d50e3effdb24e8dc7f6259888fdd881a13c2ccab9edd2e2d53fae94c1d188-restic-temp-458278bd766414cd14cd00c19b907206
-rw-r--r--    1 XX   XX          0 Jul 27 08:06 c60a97a6c25e20a1b6344fad0fc3facd5238424a4c0f85914bb6069173de8928-restic-temp-6946471c03a62c0486d36fc203b4450a
-rw-------    1 XX   XX        168 Jan  1 01:12 c8253df8e02205b4227435814a9383cad185a71b96a2d716acc9096fe98ae855
-rw-r--r--    1 XX   XX          0 Nov 20  2021 e820fb5d616b780ef23debfa8589c957fd7f05e0c61d7f0d2202018289e6b10c
sftp>

After removing all empty lock files, restic works again.

But the user experience is not great. Maybe ignore empty lock files? Or give a hint about removing broken lock files?

@tbm
Copy link
Contributor Author

tbm commented Jan 13, 2023

Oh I just saw your edit about restic unlock --remove-all: maybe you can print that info when there's a broken lock file.

@tbm
Copy link
Contributor Author

tbm commented Jan 13, 2023

btw, this is probably normal behaviour but I wanted to mention it just in case. As you can see above,there's an old lock file from Jan 1.

restic backup didn't complain about this now that I've donerestic forget --prune it tells me:

unable to create lock in backend: repository is already locked by PID 871624 on jirafa by tbm (UID 1000, GID 1000)
lock was created at 2023-01-01 09:12:24 (294h51m16.719418867s ago)
storage ID c8XXXXX
the `unlock` command can be used to remove stale locks

(The error message also tells me what to do, which is good; I'm not sure what storage ID does there.)

@MichaelEischer
Copy link
Member

But the user experience is not great. Maybe ignore empty lock files? Or give a hint about removing broken lock files?

I agree that the warning should be more helpful. For now, I'm in favor of printing a more helpful error message. The problem with invalid lock files, is that it can be unclear whether the file is just an old failed upload or a recent lock file that cannot be read for some reason. Just ignoring empty lock files might be possible though. But I'd like to avoid that if possible.

If you want you can manually delete all *-restic-temp-* files in the repository.

restic backup didn't complain about this now that I've donerestic forget --prune it tells me:

backup only requires a non-exclusive lock (that is multiple backups can run in parallel). prune must lock the repo exclusively.

@tbm
Copy link
Contributor Author

tbm commented Jan 14, 2023

If you want you can manually delete all -restic-temp- files in the repository.

I've done that yesterday and everything is working fine.

backup only requires a non-exclusive lock (that is multiple backups can run in parallel). prune must lock the repo exclusively.

Ok, that's what I thought. Thanks for confirming.

@MichaelEischer thanks for your exceptional support and maintenance! I really appreciate all the work you and @fd0 put into restic!

@msmafra
Copy link

msmafra commented Jan 15, 2023

Mine are sent to a USB 3 external HDD:

with 0.15.0

❯ restic list snapshots
repository 6d9f087d opened (version 1)
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 552.330144ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 582.280027ms: load(<lock/4b57d0b713>): invalid data returned
unable to create lock in backend: Fatal: context canceled

❯ restic unlock
repository 6d9f087d opened (version 1)
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 552.330144ms: load(<lock/4b57d0b713>): invalid data returned

❯ restic prune --dry-run
repository 6d9f087d opened (version 1)
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 552.330144ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 582.280027ms: load(<lock/4b57d0b713>): invalid data returned
unable to create lock in backend: Fatal: context canceled

❯ restic rebuild-index --verbose --cleanup-cache  --no-lock
repository 6d9f087d opened (version 1)
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 552.330144ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 720.254544ms: load(<lock/4b57d0b713>): invalid data returned
Load(<lock/4b57d0b713>, 0, 0) returned error, retrying after 582.280027ms: load(<lock/4b57d0b713>): invalid data returned
unable to create lock in backend: Fatal: context canceled

With 0.14.0

❯ restic rebuild-index --verbose --cleanup-cache  --no-lock
repository 6d9f087d opened (repository version 1) successfully, password is correct
loading indexes...
getting pack files to read...
rebuilding index
[3:07] 100.00%  51130 / 51130 packs processed
deleting obsolete index files
[0:00] 100.00%  63 / 63 files deleted
done

❯ restic prune --dry-run
repository 6d9f087d opened (repository version 1) successfully, password is correct
loading indexes...
loading all snapshots...
finding data that is still in use for 37 snapshots
[1:59] 100.00%  37 / 37 snapshots
searching used packs...
collecting packs for deletion and repacking
[0:11] 100.00%  51130 / 51130 packs processed

to repack:             0 blobs / 0 B
this removes:          0 blobs / 0 B
to delete:             0 blobs / 0 B
total prune:           0 blobs / 0 B
remaining:       3140159 blobs / 272.294 GiB
unused size after prune: 4.289 GiB (1.58% of remaining size)

@MichaelEischer
Copy link
Member

You have to use restic unlock --remove-all. A plain restic unlock won't help, as that only remove stale, but not invalid lock files. (Without being able to read a lock file, it is not possible to decide whether it is stale) Before you remove the locks, could you provide us the output of ls -la path/to/repo/locks?

@msmafra
Copy link

msmafra commented Jan 16, 2023

[Edit confirmed without posting a message]
Thanks, that way it worked! I just did not understand why 0.14 doesn't need it and 0.15 needs.

❯ ls /run/media/marcelo/THEGREY/BACKUPS/restic_tars_linux/locks      
4b57d0b713a54f2dff227d3e25925451ed76db3062cde7ee7712f95000241177

❯ restic unlock --remove-all
repository 6d9f087d opened (version 1)
successfully removed 1 locks

❯ restic version            
restic 0.15.0 compiled with go1.19.5 on linux/amd64

❯ restic prune --dry-run --verbose
repository 6d9f087d opened (version 1)
loading indexes...
loading all snapshots...
finding data that is still in use for 37 snapshots
[0:51] 100.00%  37 / 37 snapshots
searching used packs...
collecting packs for deletion and repacking

Would have made the following changes:
used:            3086898 blobs / 268.005 GiB
unused:            53261 blobs / 4.289 GiB
total:           3140159 blobs / 272.294 GiB
unused size: 1.58% of total size

to repack:             0 blobs / 0 B
this removes:          0 blobs / 0 B
to delete:             0 blobs / 0 B
total prune:           0 blobs / 0 B
remaining:       3140159 blobs / 272.294 GiB
unused size after prune: 4.289 GiB (1.58% of remaining size)

totally used packs:      47733
partly used packs:        3397
unused packs:                0

to keep:           51130 packs
to repack:             0 packs
to delete:             0 packs
Repeated prune dry-runs can report slightly different amounts of data to keep or repack. This is expected behavior.

Would have repacked and removed the following packs:
{}

Would have removed the following no longer used packs:
{}


@tbm
Copy link
Contributor Author

tbm commented Jan 16, 2023

@MichaelEischer what do you think of having restic unlock (without --remove-all) remove empty lock files (0 bytes), or do you think that's a bad idea?

@hraban
Copy link

hraban commented Jan 16, 2023

I'm having the same issue since 0.15. Just ftr is the recommended solution to remove all locks, or is this a regression and should I just go back to .14, and wait for .15.1?

@MichaelEischer
Copy link
Member

I'm having the same issue since 0.15. Just ftr is the recommended solution to remove all locks, or is this a regression and should I just go back to .14, and wait for .15.1?

The root cause are some left-over invalid lock files. restic 0.15 is now much more strict and fails if it cannot check the validity of a lock file instead of ignoring it. For now the recommended solution is to once run unlock --remove-all.

what do you think of having restic unlock (without --remove-all) remove empty lock files (0 bytes), or do you think that's a bad idea?

#4152 ignores empty lock files, such that empty lock files will be ignored again. To clean up the empty lock files, the prune command is probably more suitable.

@fd0 fd0 closed this as completed in #4152 Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: need feedback waiting for feedback, e.g. from the submitter type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants