Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

Closed
zjs opened this issue Sep 23, 2019 · 4 comments
Closed

Comments

@zjs
Copy link

zjs commented Sep 23, 2019

The Cluster API Upgrade Tool orchestrates a rolling upgrade of nodes. It would be helpful if that upgrade process occurred in a way that respects any PodDisruptionBudget resources defined in the workload cluster.

This would ensure that the upgrade respects the expectations of the users of a workload cluster (in addition to the expectations of the administrator of the workload cluster — as defined by the maxSurge/maxUnavailable properties of the the MachineDeployment).

@ncdc
Copy link
Contributor

ncdc commented Sep 24, 2019

We are expecting node cordoning and draining (which uses eviction) to take care of this. It will be available in a future v0.1.x CAPI release (v1alpha2). The PR is kubernetes-sigs/cluster-api#1096

@detiber
Copy link
Contributor

detiber commented Sep 24, 2019

I'm not necessarily sure this is something that is the responsibility of the upgrade tool as much as for Cluster API itself.

There are two potential approaches that would help there, though:

  • Automated node cordoning & draining: Centralize Node Cordon & Drain kubernetes-sigs/cluster-api#994

    • Would help respect disruption budgets by preventing the draining of pods with disruption budgets
    • Would require that maxSurge/maxUnavailable for the MachineDeployment > 1 to provide value
    • May require that the controller manager be configured for parallel reconciliations to avoid blocking the rollout too much
  • Incorporating some of the scheduling logic ala the cluster autoscaler to make better choices about which Machines to delete in which order

As @ncdc mentioned the first approach is already being planned. The later approach has only been discussed briefly without any current intentions of implementation due to complexity and maintenance considerations.

@ncdc
Copy link
Contributor

ncdc commented Sep 24, 2019

We originally had this filed as #5 but closed it for the reasons mentioned above about having CAPI do it.

@ncdc
Copy link
Contributor

ncdc commented Oct 17, 2019

@zjs is it ok to close this, now that CAPI v1alpha2 has node cordoning and draining built in as part of machine deletion?

@zjs zjs closed this as completed Oct 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants