Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

zjs · 2019-09-23T22:50:43Z

The Cluster API Upgrade Tool orchestrates a rolling upgrade of nodes. It would be helpful if that upgrade process occurred in a way that respects any PodDisruptionBudget resources defined in the workload cluster.

This would ensure that the upgrade respects the expectations of the users of a workload cluster (in addition to the expectations of the administrator of the workload cluster — as defined by the maxSurge/maxUnavailable properties of the the MachineDeployment).

The text was updated successfully, but these errors were encountered:

ncdc · 2019-09-24T15:28:13Z

We are expecting node cordoning and draining (which uses eviction) to take care of this. It will be available in a future v0.1.x CAPI release (v1alpha2). The PR is kubernetes-sigs/cluster-api#1096

detiber · 2019-09-24T15:31:28Z

I'm not necessarily sure this is something that is the responsibility of the upgrade tool as much as for Cluster API itself.

There are two potential approaches that would help there, though:

Automated node cordoning & draining: Centralize Node Cordon & Drain kubernetes-sigs/cluster-api#994
- Would help respect disruption budgets by preventing the draining of pods with disruption budgets
- Would require that maxSurge/maxUnavailable for the MachineDeployment > 1 to provide value
- May require that the controller manager be configured for parallel reconciliations to avoid blocking the rollout too much
Incorporating some of the scheduling logic ala the cluster autoscaler to make better choices about which Machines to delete in which order

As @ncdc mentioned the first approach is already being planned. The later approach has only been discussed briefly without any current intentions of implementation due to complexity and maintenance considerations.

ncdc · 2019-09-24T15:35:12Z

We originally had this filed as #5 but closed it for the reasons mentioned above about having CAPI do it.

ncdc · 2019-10-17T14:05:02Z

@zjs is it ok to close this, now that CAPI v1alpha2 has node cordoning and draining built in as part of machine deletion?

zjs closed this as completed Oct 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

zjs commented Sep 23, 2019

ncdc commented Sep 24, 2019

detiber commented Sep 24, 2019

ncdc commented Sep 24, 2019

ncdc commented Oct 17, 2019

Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

Respect PodDisruptionBudget resources in the workload cluster during upgrade #71

Comments

zjs commented Sep 23, 2019

ncdc commented Sep 24, 2019

detiber commented Sep 24, 2019

ncdc commented Sep 24, 2019

ncdc commented Oct 17, 2019