Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce backoff in re-enqueuing machines during creation/deletion failures. #525

Merged
merged 1 commit into from
Oct 6, 2020

Conversation

hardikdr
Copy link
Member

@hardikdr hardikdr commented Oct 3, 2020

Co-authored-by: Prashanth prashanth@sap.com

What this PR does / why we need it: Currently we immediately re-enqueue the machine-object on creation/deletion failure. This PR adds a constant delay before retrying the operation. Eventually, we must build exponential retry on top or parallel to this solution.

  • PR introduces a new phase CrashLoopBackOff. This should be used when machine creation fails, but machine-set doesn't really need to replace the machine-object[as that won't anyways help].

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Introduced a backoff in re-enqueuing machines on creation/deletion failures. Avoids throttling APIServer & provider calls.
Adds a new phase `CrashLoopBackOff` that is set due to machine creation failures. 

@hardikdr hardikdr requested review from ggaurav10 and a team as code owners October 3, 2020 12:48
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 3, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 3, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 3, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 3, 2020
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 4, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 4, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 4, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 4, 2020
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 5, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 5, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 5, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 5, 2020
Copy link
Contributor

@prashanth26 prashanth26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm on squashing the changes.

Tested the changes locally, looks good to me.

@gardener-robot gardener-robot added the reviewed/lgtm Has approval for merging label Oct 5, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 5, 2020
@hardikdr hardikdr added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 5, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Oct 5, 2020
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
pkg/controller/machine.go Outdated Show resolved Hide resolved
…n and deletion failures

Co-authored-by: Prashanth <prashanth@sap.com>
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Oct 5, 2020
Copy link
Contributor

@prashanth26 prashanth26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@hardikdr hardikdr merged commit 39a3817 into gardener:master Oct 6, 2020
@prashanth26 prashanth26 changed the title Cherry-pick on master: Introduce delay in re-enqueuing machines during creation-deletion failures. Introduce backoff in re-enqueuing machines during creation/deletion failures. Oct 6, 2020
@zuzzas zuzzas mentioned this pull request Oct 8, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants