Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhibit cluster-autoscaler during cluster rollouts #497

Closed
hardikdr opened this issue Jun 8, 2020 · 14 comments
Closed

Inhibit cluster-autoscaler during cluster rollouts #497

hardikdr opened this issue Jun 8, 2020 · 14 comments
Assignees
Labels
kind/bug Bug priority/2 Priority (lower number equals higher priority) status/in-progress Issue is in progress/work
Milestone

Comments

@hardikdr
Copy link
Member

hardikdr commented Jun 8, 2020

What would you like to be added:
Cluster-autoscaler seems to have undefined/un-intended behavior during cluster roll-outs.

Currently, worker-extension disables the cluster-autoscaler during roll-out using this check.

  • It mainly checks the numUpdated >= numDesired of the machine-deployment's status. This might not be the most reliable way of disabling the autoscaler.

Opening this issue to discuss the different approaches and later enhance either autoscaler, MCM, worker-extension, or combination of them to handle the overall-situation.

A couple of known/discussed approaches are the following:

  1. Disable the cluster-autoscaler via worker-extension.:
  • This approach indicates a further enhancement in the worker-extension here.
    1. This can be achieved by better parsing the machine-deployment status, maybe via conditions, given machine-deployment provides enough hints of on-going roll-outs.
    2. Worker-extension could also simply count the number of running machine-object from old and new machine sets and take the decision accordingly of disabling the CA.
  1. Enhance MCM to specially-handle the scaling-events during roll-outs.
  • MCM could disable the scale-up of the old-machineset and scale-down of the new-machineset during machine-deployment roll-outs.
  • This approach allows the system to keep the CA running even during roll-outs, and lesser burden on worker-extension.
  • Known points to be considered beforehand:
    • CA taints the node-objects directly during scale-down. We need to interfere with such tainting for this approach to seamlessly let the MCM running.
    • The complexity of code-changes at machine-deployment controller and side-effects on the overall roll-out process needs to be reviewed.
  1. Adapt CA to not trigger the scale-down on machine-deployments.
  • CA can be adapted to check the status of machine-deployment/set and simply not reduce the Replicas of machine-deployment during scale-down.
  • Known points to be considered beforehand:
    • CA taints the node-objects directly during scale-down. We need to interfere with such tainting for this approach to seamlessly let the MCM running.
  1. Disable the scale-down in CA during cluster roll-outs.
  • Cluster-autoscaler supports --scale-down-disabled flag, this helps in disabling only the scale-down aspect of CA.
  • Worker-extension could set this flag during cluster-roll-outs[with existing or slightly better checks], it'll essentially block the scale-down and let the scale-up continue.
  • Knowns points to be considered beforehand:
    • The question of policy vs technology, would we, and stakeholders are fine with disabled scale-down during rollout, is something we should discuss.

Please feel free to suggest new approaches or provide feedback on existing ones.

@hardikdr
Copy link
Member Author

hardikdr commented Jun 8, 2020

@timebertt
Copy link
Member

cc @danielfoehrKn

@vlerenc
Copy link
Member

vlerenc commented Jun 8, 2020

During a discussion back then with @rfranzke, option (4) with some tweaks seemed the easiest to implement (based on a new condition that @danielfoehrKn will introduce):

  • There will be a new RollingUpdate condition for the worker extension objects
  • When it's set, a controller in Gardener shall disable scale-down in CA by setting --scale-down-enabled=false
  • Scale-up on the other hand shall remain active in MCM, but always "steered" towards the new machineset, never the old one
  • When the rolling update is over, CA scale-down will be re-enabled (that and when the rolling update is done, is available at the machinedeplyment/machineset, so that this can be done, right?)

Is that sensible?

@hardikdr
Copy link
Member Author

hardikdr commented Jun 8, 2020

Overall that sounds like a good solution. Just that the check of removing the RollingUpdate condition could be enhanced a little. [machineset's replicas, or a number of machine-objects].

@amshuman-kr
Copy link

amshuman-kr commented Jun 8, 2020

@hardikdr I think option 2 is still workable with only changes to MCM and without any changes to CA.

  1. As discussed offline, scale up is fully under the control of MCM and MCM can decide to scale up only the new MachineSet if the scale up happens during a rolling update.
  2. As for scale down, CA already provides an annotation mechanism to mark nodes as not available for scale down. This annotation is used by CA to filter out such nodes very early in the scale down logic. So, how about the following steps during a rolling update?
    1. MachineDeployment controller creates the new MachineSet with the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation in the node template. If such an annotation is already defined in the MachineDeployment then this and the defined annotation value is remembered for later.
    2. Rolling update is executed as it happens now. No changes to the core logic of rolling update in MCM are required. CA will honour the annotation and not scale down any of the machines from the new MachineSet.
    3. Once, the machines are rolled and the old MachineSet is scaled down to zero (or deleted), the MachineDeployment controller updates new (and now, the only remaining) MachineSet to remove the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation (or restore it to the value remembered from step 1 above).

Rollback could be the same but with the old and the new swapped.

@amshuman-kr
Copy link

I am keen on option 2 for the following reasons.

  1. The changes are limited to MCM.
  2. CA (which is already a pain because of our forking) remains unchanged.
  3. Worker controllers also remain unchanged.
  4. Gardner/gardenlet can forget about any special logic for CA during machine rollout.
  5. MCM and CA can be combined in a non-gardener context without any additional glue controller/logic.
  6. Both scale up and scale down will be possible during machine rollout.
    1. Scale down would still be temporarily restricted (until rollout is complete) to the old nodes. That means that scale down will happen only if any of old nodes have low resource utilization. But this seems to be the best that can be done and none of the other options can do better than this either.

@vlerenc
Copy link
Member

vlerenc commented Jun 8, 2020

@amshuman-kr I thought the main problem of (2) is the undesired taint and how to avoid it. If that's trivial, sure. If not, (4) looked simple enough and brings already most of the advantages of (2), e.g. not touching the CA code.

@amshuman-kr
Copy link

amshuman-kr commented Jun 8, 2020

@vlerenc If any node is marked with the cluster-autoscaler.kubernetes.io/scale-down-disabled annotation, then it is not considered for scale down by CA. Hence, CA does not taint it either. All it takes is to the create the new MachineSet with this annotation in the node template. CA already has the code to check this annotation and skip such nodes.

@vlerenc
Copy link
Member

vlerenc commented Jun 8, 2020

@amshuman-kr Ah, sure, yes, that's an excellent idea. Thanks!

@hardikdr
Copy link
Member Author

Thanks @amshuman-kr @vlerenc for the comments.

We overall seem to be agreeing on approach 2, with the solution of adding the cluster-autoscaler annotation on the new machine-sets. I have opened the issue on MCM #472 to track the actual changes.

@prashanth26
Copy link
Contributor

I took a while to follow this discussion. But yes, looking at the discussions and suggestions even I seem to be okay with both approaches (2) & (4). However, since the changes in (2) are only restricted to MCM I prefer that as it keeps the implementation generic enough for external adaptors of MCM to also make use of this feature.

@rfranzke
Copy link
Member

rfranzke commented Aug 7, 2020

We had another short discussion with @hardikdr @prashanth26 @timebertt, and @hardikdr @prashanth26 will take over the implementation in MCM short-term. Instead of only annotating the new nodes of a rolled machine deployment, the MCM will also annotate the old nodes. This is to prevent the CA from completely disabling scaling down machines from a machine deployment that is currently being rolled. After the rolling update finished the annotations will be removed again.

Once this change is released we can remove all special handling of the CA in Gardener and the generic Worker actuator which will simply the code there.

@rfranzke
Copy link
Member

rfranzke commented Aug 7, 2020

/area auto-scaling
/in-progress
/assign @hardikdr @prashanth26
/priority critical

@prashanth26 prashanth26 transferred this issue from gardener/autoscaler Aug 17, 2020
@prashanth26 prashanth26 added kind/bug Bug status/in-progress Issue is in progress/work labels Aug 17, 2020
@hardikdr hardikdr added this to the v0.34.0 milestone Aug 20, 2020
@hardikdr hardikdr added the priority/critical Needs to be resolved soon, because it impacts users negatively label Aug 20, 2020
@hardikdr
Copy link
Member Author

hardikdr commented Sep 8, 2020

/close with #496

@gardener-robot gardener-robot added priority/2 Priority (lower number equals higher priority) and removed priority/critical Needs to be resolved soon, because it impacts users negatively labels Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug priority/2 Priority (lower number equals higher priority) status/in-progress Issue is in progress/work
Projects
None yet
Development

No branches or pull requests

7 participants