Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Retry When Detecting Concurrent Updates #113

Merged
merged 10 commits into from
Nov 27, 2024

Conversation

samuel-esp
Copy link
Collaborator

Motivation

When the list of resources to be processed is large (e.g., 2000+ resources), the KubeDownscaler algorithm may take up to half an hour to complete its run. This can lead to the following scenario:

  1. All resources are initially retrieved.
  2. As KubeDownscaler processes each resource, a resource that was previously retrieved may be modified by another entity (such as an HPA, Keda, or through manual intervention).
  3. When KubeDownscaler attempts to process this modified resource, it encounters the following error: "the object has been modified; please apply your changes to the latest version and try again".

Even if KubeDownscaler will process the resource again during the next iteration, this could be a blocking behavior when --once argument is enabled on a large cluster. It could lead to some resources not being correctly processed

Changes

Added a new argument --max-retries-on-conflict that allows the user to specify how many times the resource update should be retried before moving on to the next resource

Tests done

  • Unit Tests
  • Built custom image and tested on a test cluster

TODO

  • I've assigned myself to this PR

@samuel-esp samuel-esp added the enhancement New feature or request label Nov 16, 2024
@samuel-esp samuel-esp self-assigned this Nov 16, 2024
@samuel-esp samuel-esp linked an issue Nov 16, 2024 that may be closed by this pull request
@samuel-esp samuel-esp marked this pull request as ready for review November 17, 2024 15:56
kube_downscaler/cmd.py Outdated Show resolved Hide resolved
kube_downscaler/scaler.py Outdated Show resolved Hide resolved
kube_downscaler/scaler.py Outdated Show resolved Hide resolved
@samuel-esp samuel-esp requested a review from JTaeuber November 20, 2024 10:48
Copy link
Member

@JTaeuber JTaeuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change for consistency. Other than that LGTM

kube_downscaler/scaler.py Outdated Show resolved Hide resolved
@samuel-esp samuel-esp requested a review from JTaeuber November 22, 2024 17:42
Copy link
Member

@JTaeuber JTaeuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuel-esp samuel-esp merged commit 92416ee into caas-team:main Nov 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource Processing Failed Due to Concurrent Update
2 participants