Select workloads to keep alive during turndown #36

michaelmdresser · 2022-02-10T16:51:14Z

I'm relaying a user request.

They would like to be able to select specific workloads to keep alive (e.g. Kubecost, Prometheus, Grafana) during a turndown.

This behavior is a little complicated to implement, especially in a non-autoscaling environment. We could initially only support this feature in autoscaling environments but I'd need to do some research and testing.

Roadmap positioning of this feature isn't known yet, but I wanted to record it somewhere!

dwbrown2 · 2022-02-10T17:02:15Z

Have definitely heard this theme before... I think it can be broadened to say keep this set of labels, annotations, namespaces, etc. alive

michaelmdresser · 2022-02-10T17:06:55Z

Yes, absolutely agree with that broader definition. I'd bias towards supporting label-specified workloads/namespaces to start!

AjayTripathy · 2022-02-10T18:36:18Z

Isn't there a standard "do not evict" label for the cluster autoscaler that we can leverage?

dwbrown2 · 2022-02-10T18:38:27Z

Interesting ideas!

mbolt35 · 2022-02-10T18:51:27Z

Isn't there a standard "do not evict" label for the cluster autoscaler that we can leverage?

I don't think there is a "do not evict" label for autoscaler, but there does exist a "safe-evict" annotation that can be used to tell the autoscaler it can evict if necessary.

This use-case doesn't seem like a valid use of turndown, which is designed to shutdown the cluster from being used. It sounds like the behavior they're looking for is: "I want my cluster to scale down to a set of workloads." This can be accomplished by marking safe-evict on workloads they do not mind being downscaled. This doesn't require cluster-turndown at all.

mbolt35 · 2022-02-10T19:05:36Z

If we're only talking about clusters without autoscaling nodepools, it does seem like we've already written a lot of the foundational code which does this in the cluster-controller component (similar to the way one click cluster sizing worked):

Cluster Right-Size using Predefined Workloads (annotations, label, etc..) [This might require a small amount of work]
Send Cluster Spec to cluster-controller, which will automatically delete/resize nodepools.

This isn't on a schedule, and cluster-turndown doesn't have an implementation for pulling a cluster spec from cluster right-sizing. Either way, this still feels like a cluster-autoscaler solution with safe-evict annotations is the better way to go.

michaelmdresser added the enhancement New feature or request label Feb 10, 2022

kirbsauce assigned michaelmdresser Mar 21, 2022

kirbsauce self-assigned this Jun 13, 2022

kirbsauce removed their assignment Jul 1, 2022

michaelmdresser removed their assignment Aug 18, 2022

Adam-Stack-PM added configuration integration labels Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select workloads to keep alive during turndown #36

Select workloads to keep alive during turndown #36

michaelmdresser commented Feb 10, 2022

dwbrown2 commented Feb 10, 2022

michaelmdresser commented Feb 10, 2022

AjayTripathy commented Feb 10, 2022

dwbrown2 commented Feb 10, 2022

mbolt35 commented Feb 10, 2022

mbolt35 commented Feb 10, 2022

Select workloads to keep alive during turndown #36

Select workloads to keep alive during turndown #36

Comments

michaelmdresser commented Feb 10, 2022

dwbrown2 commented Feb 10, 2022

michaelmdresser commented Feb 10, 2022

AjayTripathy commented Feb 10, 2022

dwbrown2 commented Feb 10, 2022

mbolt35 commented Feb 10, 2022

mbolt35 commented Feb 10, 2022