-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Two backups running simultaneously when the sidecar container loses its lease? #1505
Comments
A similar thing just happened during a long-running So first, a
The lock is lost, reacquired, and then the backup starts over...
But it fails, because the
Then, some time later, that
|
Hello @sgielen. Thank you for reporting the issue. We will try to reproduce the issue on our end and get back to you. |
@hmsayem did you succeed in reproducing the issue on your end? :) |
@sgielen. Unfortunately, we have not had the opportunity to reproduce the issue yet. We will start working on the issue as soon as possible and keep you updated on any progress. Thank you for your patience. |
Did it happen for the same BackupSession? |
Yes absolutely. It is very reproducible here, now that I've noticed that this is what happens. I think it has been occurring for a long time and I suspect it also might be what caused #1488? You can see it in the logs here, too, from this morning:
note how it starts
|
@sgielen can you please share the yaml of the BackupSession ( |
It was
I killed |
I believe it should be possible to reproduce this issue by deleting the |
That didn't work unfortunately, the process just created a new |
Got it! If you change the holder of the lease using
Now, two backups are ongoing at the same time. Can you reproduce this, too? |
So the backup keeps restarting indefinitely unless you kill it explicitly? |
Every time the process loses its lease, it restarts the backup, while keeping the old one running. Eventually, all of them will succeed or fail, but that's beside the point -- it's the multiple ongoing backups at the same time which is the root of the issue here. |
I will try to reproduce it and let you know the update. |
Thank you! |
I was just looking at the live logs of the stash sidecar container, as it was backing up a volume:
As it was doing this, it lost the backup lock because of some timeout talking to the API server:
So it looks like this restarted the backup, but did not actually kill the old one, so now there are two running backups according to
ps aux
:Can you reproduce this on your end?
The text was updated successfully, but these errors were encountered: