Inhibit cluster-autoscaler during cluster rollouts #497

hardikdr · 2020-06-08T05:12:26Z

What would you like to be added:
Cluster-autoscaler seems to have undefined/un-intended behavior during cluster roll-outs.

Currently, worker-extension disables the cluster-autoscaler during roll-out using this check.

It mainly checks the numUpdated >= numDesired of the machine-deployment's status. This might not be the most reliable way of disabling the autoscaler.

Opening this issue to discuss the different approaches and later enhance either autoscaler, MCM, worker-extension, or combination of them to handle the overall-situation.

A couple of known/discussed approaches are the following:

Disable the cluster-autoscaler via worker-extension.:

This approach indicates a further enhancement in the worker-extension here.
1. This can be achieved by better parsing the machine-deployment status, maybe via conditions, given machine-deployment provides enough hints of on-going roll-outs.
2. Worker-extension could also simply count the number of running machine-object from old and new machine sets and take the decision accordingly of disabling the CA.

Enhance MCM to specially-handle the scaling-events during roll-outs.

MCM could disable the scale-up of the old-machineset and scale-down of the new-machineset during machine-deployment roll-outs.
This approach allows the system to keep the CA running even during roll-outs, and lesser burden on worker-extension.
Known points to be considered beforehand:
- CA taints the node-objects directly during scale-down. We need to interfere with such tainting for this approach to seamlessly let the MCM running.
- The complexity of code-changes at machine-deployment controller and side-effects on the overall roll-out process needs to be reviewed.

Adapt CA to not trigger the scale-down on machine-deployments.

CA can be adapted to check the status of machine-deployment/set and simply not reduce the Replicas of machine-deployment during scale-down.
Known points to be considered beforehand:
- CA taints the node-objects directly during scale-down. We need to interfere with such tainting for this approach to seamlessly let the MCM running.

Disable the scale-down in CA during cluster roll-outs.

Cluster-autoscaler supports --scale-down-disabled flag, this helps in disabling only the scale-down aspect of CA.
Worker-extension could set this flag during cluster-roll-outs[with existing or slightly better checks], it'll essentially block the scale-down and let the scale-up continue.
Knowns points to be considered beforehand:
- The question of policy vs technology, would we, and stakeholders are fine with disabled scale-down during rollout, is something we should discuss.

Please feel free to suggest new approaches or provide feedback on existing ones.

The text was updated successfully, but these errors were encountered:

hardikdr · 2020-06-08T05:13:24Z

cc @vlerenc @amshuman-kr @rfranzke @prashanth26

timebertt · 2020-06-08T07:27:56Z

cc @danielfoehrKn

vlerenc · 2020-06-08T11:22:30Z

During a discussion back then with @rfranzke, option (4) with some tweaks seemed the easiest to implement (based on a new condition that @danielfoehrKn will introduce):

There will be a new RollingUpdate condition for the worker extension objects
When it's set, a controller in Gardener shall disable scale-down in CA by setting --scale-down-enabled=false
Scale-up on the other hand shall remain active in MCM, but always "steered" towards the new machineset, never the old one
When the rolling update is over, CA scale-down will be re-enabled (that and when the rolling update is done, is available at the machinedeplyment/machineset, so that this can be done, right?)

Is that sensible?

hardikdr · 2020-06-08T11:35:51Z

Overall that sounds like a good solution. Just that the check of removing the RollingUpdate condition could be enhanced a little. [machineset's replicas, or a number of machine-objects].

amshuman-kr · 2020-06-08T13:24:03Z

@hardikdr I think option 2 is still workable with only changes to MCM and without any changes to CA.

As discussed offline, scale up is fully under the control of MCM and MCM can decide to scale up only the new MachineSet if the scale up happens during a rolling update.
As for scale down, CA already provides an annotation mechanism to mark nodes as not available for scale down. This annotation is used by CA to filter out such nodes very early in the scale down logic. So, how about the following steps during a rolling update?
1. MachineDeployment controller creates the new MachineSet with the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation in the node template. If such an annotation is already defined in the MachineDeployment then this and the defined annotation value is remembered for later.
2. Rolling update is executed as it happens now. No changes to the core logic of rolling update in MCM are required. CA will honour the annotation and not scale down any of the machines from the new MachineSet.
3. Once, the machines are rolled and the old MachineSet is scaled down to zero (or deleted), the MachineDeployment controller updates new (and now, the only remaining) MachineSet to remove the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation (or restore it to the value remembered from step 1 above).

Rollback could be the same but with the old and the new swapped.

amshuman-kr · 2020-06-08T13:37:34Z

I am keen on option 2 for the following reasons.

The changes are limited to MCM.
CA (which is already a pain because of our forking) remains unchanged.
Worker controllers also remain unchanged.
Gardner/gardenlet can forget about any special logic for CA during machine rollout.
MCM and CA can be combined in a non-gardener context without any additional glue controller/logic.
Both scale up and scale down will be possible during machine rollout.
1. Scale down would still be temporarily restricted (until rollout is complete) to the old nodes. That means that scale down will happen only if any of old nodes have low resource utilization. But this seems to be the best that can be done and none of the other options can do better than this either.

vlerenc · 2020-06-08T13:48:37Z

@amshuman-kr I thought the main problem of (2) is the undesired taint and how to avoid it. If that's trivial, sure. If not, (4) looked simple enough and brings already most of the advantages of (2), e.g. not touching the CA code.

amshuman-kr · 2020-06-08T14:03:47Z

@vlerenc If any node is marked with the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation, then it is not considered for scale down by CA. Hence, CA does not taint it either. All it takes is to the create the new MachineSet with this annotation in the node template. CA already has the code to check this annotation and skip such nodes.

vlerenc · 2020-06-08T14:08:58Z

@amshuman-kr Ah, sure, yes, that's an excellent idea. Thanks!

hardikdr · 2020-06-10T03:47:06Z

Thanks @amshuman-kr @vlerenc for the comments.

We overall seem to be agreeing on approach 2, with the solution of adding the cluster-autoscaler annotation on the new machine-sets. I have opened the issue on MCM #472 to track the actual changes.

prashanth26 · 2020-06-10T06:11:02Z

I took a while to follow this discussion. But yes, looking at the discussions and suggestions even I seem to be okay with both approaches (2) & (4). However, since the changes in (2) are only restricted to MCM I prefer that as it keeps the implementation generic enough for external adaptors of MCM to also make use of this feature.

rfranzke · 2020-08-07T08:11:24Z

We had another short discussion with @hardikdr @prashanth26 @timebertt, and @hardikdr @prashanth26 will take over the implementation in MCM short-term. Instead of only annotating the new nodes of a rolled machine deployment, the MCM will also annotate the old nodes. This is to prevent the CA from completely disabling scaling down machines from a machine deployment that is currently being rolled. After the rolling update finished the annotations will be removed again.

Once this change is released we can remove all special handling of the CA in Gardener and the generic Worker actuator which will simply the code there.

rfranzke · 2020-08-07T08:11:47Z

/area auto-scaling
/in-progress
/assign @hardikdr @prashanth26
/priority critical

hardikdr · 2020-09-08T16:12:39Z

/close with #496

gardener-robot assigned hardikdr and prashanth26 Aug 7, 2020

prashanth26 transferred this issue from gardener/autoscaler Aug 17, 2020

prashanth26 added kind/bug Bug status/in-progress Issue is in progress/work labels Aug 17, 2020

hardikdr added this to the v0.34.0 milestone Aug 20, 2020

hardikdr added the priority/critical Needs to be resolved soon, because it impacts users negatively label Aug 20, 2020

gardener-robot closed this as completed Sep 8, 2020

gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inhibit cluster-autoscaler during cluster rollouts #497

Inhibit cluster-autoscaler during cluster rollouts #497

hardikdr commented Jun 8, 2020

hardikdr commented Jun 8, 2020

timebertt commented Jun 8, 2020

vlerenc commented Jun 8, 2020

hardikdr commented Jun 8, 2020

amshuman-kr commented Jun 8, 2020 •

edited

Loading

amshuman-kr commented Jun 8, 2020

vlerenc commented Jun 8, 2020

amshuman-kr commented Jun 8, 2020 •

edited

Loading

vlerenc commented Jun 8, 2020

hardikdr commented Jun 10, 2020

prashanth26 commented Jun 10, 2020

rfranzke commented Aug 7, 2020

rfranzke commented Aug 7, 2020

hardikdr commented Sep 8, 2020

Inhibit cluster-autoscaler during cluster rollouts #497

Inhibit cluster-autoscaler during cluster rollouts #497

Comments

hardikdr commented Jun 8, 2020

hardikdr commented Jun 8, 2020

timebertt commented Jun 8, 2020

vlerenc commented Jun 8, 2020

hardikdr commented Jun 8, 2020

amshuman-kr commented Jun 8, 2020 • edited Loading

amshuman-kr commented Jun 8, 2020

vlerenc commented Jun 8, 2020

amshuman-kr commented Jun 8, 2020 • edited Loading

vlerenc commented Jun 8, 2020

hardikdr commented Jun 10, 2020

prashanth26 commented Jun 10, 2020

rfranzke commented Aug 7, 2020

rfranzke commented Aug 7, 2020

hardikdr commented Sep 8, 2020

amshuman-kr commented Jun 8, 2020 •

edited

Loading

amshuman-kr commented Jun 8, 2020 •

edited

Loading