-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inhibit cluster-autoscaler during cluster rollouts #497
Comments
During a discussion back then with @rfranzke, option (4) with some tweaks seemed the easiest to implement (based on a new condition that @danielfoehrKn will introduce):
Is that sensible? |
Overall that sounds like a good solution. Just that the check of removing the |
@hardikdr I think option 2 is still workable with only changes to MCM and without any changes to CA.
Rollback could be the same but with the old and the new swapped. |
I am keen on option 2 for the following reasons.
|
@amshuman-kr I thought the main problem of (2) is the undesired taint and how to avoid it. If that's trivial, sure. If not, (4) looked simple enough and brings already most of the advantages of (2), e.g. not touching the CA code. |
@vlerenc If any node is marked with the |
@amshuman-kr Ah, sure, yes, that's an excellent idea. Thanks! |
Thanks @amshuman-kr @vlerenc for the comments. We overall seem to be agreeing on approach 2, with the solution of adding the cluster-autoscaler annotation on the new machine-sets. I have opened the issue on MCM #472 to track the actual changes. |
I took a while to follow this discussion. But yes, looking at the discussions and suggestions even I seem to be okay with both approaches (2) & (4). However, since the changes in (2) are only restricted to MCM I prefer that as it keeps the implementation generic enough for external adaptors of MCM to also make use of this feature. |
We had another short discussion with @hardikdr @prashanth26 @timebertt, and @hardikdr @prashanth26 will take over the implementation in MCM short-term. Instead of only annotating the new nodes of a rolled machine deployment, the MCM will also annotate the old nodes. This is to prevent the CA from completely disabling scaling down machines from a machine deployment that is currently being rolled. After the rolling update finished the annotations will be removed again. Once this change is released we can remove all special handling of the CA in Gardener and the generic Worker actuator which will simply the code there. |
/area auto-scaling |
/close with #496 |
What would you like to be added:
Cluster-autoscaler seems to have undefined/un-intended behavior during cluster roll-outs.
Currently, worker-extension disables the cluster-autoscaler during roll-out using this check.
numUpdated >= numDesired
of the machine-deployment's status. This might not be the most reliable way of disabling the autoscaler.Opening this issue to discuss the different approaches and later enhance either autoscaler, MCM, worker-extension, or combination of them to handle the overall-situation.
A couple of known/discussed approaches are the following:
Replicas
of machine-deployment during scale-down.scale-down
in CA during cluster roll-outs.--scale-down-disabled
flag, this helps in disabling only the scale-down aspect of CA.policy vs technology
, would we, and stakeholders are fine with disabled scale-down during rollout, is something we should discuss.Please feel free to suggest new approaches or provide feedback on existing ones.
The text was updated successfully, but these errors were encountered: