Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for misconfigured PDBs on Node drain and set proper error description #591

Merged
merged 1 commit into from
Jan 29, 2021

Conversation

ialidzhikov
Copy link
Member

@ialidzhikov ialidzhikov commented Jan 26, 2021

/area ops-productivity
/kind enhancement

Currently the Machine status does not clearly indicate pod eviction failures that are caused by misconfigured PodDisruptionBudgets.

With this PR the Machine .status.lastOperation.description clearly indicates for misconfigured PodDisruptionBudgets:

  lastOperation:
    description: 'Drain failed due to - [error while evicting pod "nginx-deployment-6b474476c4-2mck6":
      pod disruption budget default/pdb is misconfigured and requires zero voluntary
      evictions, error while evicting pod "nginx-deployment-6b474476c4-8b8hl": pod
      disruption budget default/pdb is misconfigured and requires zero voluntary evictions].
      Will retry in next sync. Initiate node drain'
    lastUpdateTime: "2021-01-26T21:12:56Z"
    state: Failed
    type: Delete

This will allow components like extension controllers to match .status.lastOperation.description and to properly flag it as configuration problem - ref gardener/gardener#3020.

Which issue(s) this PR fixes:
Needed for gardener/gardener#3020

Release note:

machine-controller-manager now checks for misconfigured PodDisruptionBudgets (ones that require zero voluntary evictions and make impossible the graceful Node drain) and sets better Machine `.status.lastOperation.description` for such Machines. This change is breaking as out-of-tree providers need new RBAC permissions - list and watch access for PodDisruptionBudgets in the target cluster.

…iption

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
@ialidzhikov ialidzhikov requested review from ggaurav10 and a team as code owners January 26, 2021 21:54
@gardener-robot gardener-robot added needs/review Needs review area/ops-productivity Operator productivity related (how to improve operations) kind/enhancement Enhancement, improvement, extension size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Jan 26, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Jan 26, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Jan 26, 2021
@prashanth26
Copy link
Contributor

/assign

Copy link
Contributor

@prashanth26 prashanth26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change @ialidzhikov . A very meaningful change.

Minor comment: This change is only on the OOT branch of MCM and this change wouldn't be backward compatible with any controllers (example: extension-provider-azure or other external controllers using in-tree) currently using in-tree MCM. This is fine as we anyways would deprecate the in-tree MCM code eventually move to OOT code, just wanted to mention/remind you.

Changes /lgtm otherwise

Comment on lines +53 to +59
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- watch
- list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a reminder: We will have to update all the extensions to adapt the cluster-roles to add this post merging of this PR.

Copy link
Contributor

@prashanth26 prashanth26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Jan 27, 2021
@ialidzhikov
Copy link
Member Author

Minor comment: This change is only on the OOT branch of MCM and this change wouldn't be backward compatible with any controllers (example: extension-provider-azure or other external controllers using in-tree) currently using in-tree MCM. This is fine as we anyways would deprecate the in-tree MCM code eventually move to OOT code, just wanted to mention/remind you.

Sure, I have in mind to update the RBAC of the gardener provider extensions when we update the corresponding MCM provider version in the extension that vendors this PR. For the gardener provider extensions that still use in-tree MCM, there is no action required.

@prashanth26
Copy link
Contributor

Sure, I have in mind to update the RBAC of the gardener provider extensions when we update the corresponding MCM provider version in the extension that vendors this PR. For the gardener provider extensions that still use in-tree MCM, there is no action required.

Sure sound good. We will also need to vendor the library into the MCM OOT providers and then make releases for this change to kick in. Which is fine for now. So i shall go ahead and merge this?

@ialidzhikov
Copy link
Member Author

Sure sound good. We will also need to vendor the library into the MCM OOT providers and then make releases for this change to kick in. Which is fine for now.

Yep.

So i shall go ahead and merge this?

If it also looks good to you, then we can proceed with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ops-productivity Operator productivity related (how to improve operations) kind/enhancement Enhancement, improvement, extension needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/s Size of pull request is small (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants