-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible race condition when removing a backup in New phase and backups overlapping #4738
Comments
@adriananeci BSL is been deleted, the backup will not be deleted automatically with server version under 1.8.0(not including 1.8.0), velero 1.8.0+ has the deletion mechanism. |
There are two problems involved:
|
@adriananeci |
Velero logs might contain sensitive information as I noticed, so not sure if I'll be able to provide a full bundle since we're noticing this problem only on big clusters, which usually are the production ones. On the other hand, I've manually removed the k8s
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle-stale |
After investigation, I think the first issue problem should be fixed. Velero schedule controller should check whether there is backup labeled by the schedule's name and status.Phase is For the second problem, backup deletion controller can handle backup's status.Phase in velero/pkg/controller/backup_deletion_controller.go Lines 177 to 191 in 218bab9
|
What steps did you take and what happened:
We noticed we have a lot of backups in
New
phase on some highly loaded clusters. We believe it might be because we have a schedule configured to take a backup every 15m(*/15 * * * *
), but the backup itself is taking more than 15m to complete (based on the diff betweenStarted
andCompleted
timestamps fromvelero backup describe
) and this creates an overlapping between backups.We tried to cleanup the backups that were in the
New
phase usingbut no cleanup was actually done even if no errors were encountered during the above command, e.g.
Most probably this is because the backup-deletion plugin already removed the storage location since the schedule backup TTL is configured to 24h
and
velero backup describe hourly-k8s-backup-20220307064511
shows:What did you expect to happen:
New
phase.New
phase even if the storage location was cleaned up by backup-deletion plugin. Maybe a--force
flag might help.The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Using azure plugin
Environment:
velero version
):velero client config get features
):kubectl version
):v1.20
/etc/os-release
): FlatcarVote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: