Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select workloads to keep alive during turndown #36

Open
michaelmdresser opened this issue Feb 10, 2022 · 6 comments
Open

Select workloads to keep alive during turndown #36

michaelmdresser opened this issue Feb 10, 2022 · 6 comments
Labels

Comments

@michaelmdresser
Copy link
Contributor

I'm relaying a user request.

They would like to be able to select specific workloads to keep alive (e.g. Kubecost, Prometheus, Grafana) during a turndown.

This behavior is a little complicated to implement, especially in a non-autoscaling environment. We could initially only support this feature in autoscaling environments but I'd need to do some research and testing.

Roadmap positioning of this feature isn't known yet, but I wanted to record it somewhere!

@michaelmdresser michaelmdresser added the enhancement New feature or request label Feb 10, 2022
@dwbrown2
Copy link
Contributor

Have definitely heard this theme before... I think it can be broadened to say keep this set of labels, annotations, namespaces, etc. alive

@michaelmdresser
Copy link
Contributor Author

Yes, absolutely agree with that broader definition. I'd bias towards supporting label-specified workloads/namespaces to start!

@AjayTripathy
Copy link
Contributor

Isn't there a standard "do not evict" label for the cluster autoscaler that we can leverage?

@dwbrown2
Copy link
Contributor

Interesting ideas!

@mbolt35
Copy link
Collaborator

mbolt35 commented Feb 10, 2022

Isn't there a standard "do not evict" label for the cluster autoscaler that we can leverage?

I don't think there is a "do not evict" label for autoscaler, but there does exist a "safe-evict" annotation that can be used to tell the autoscaler it can evict if necessary.

This use-case doesn't seem like a valid use of turndown, which is designed to shutdown the cluster from being used. It sounds like the behavior they're looking for is: "I want my cluster to scale down to a set of workloads." This can be accomplished by marking safe-evict on workloads they do not mind being downscaled. This doesn't require cluster-turndown at all.

@mbolt35
Copy link
Collaborator

mbolt35 commented Feb 10, 2022

If we're only talking about clusters without autoscaling nodepools, it does seem like we've already written a lot of the foundational code which does this in the cluster-controller component (similar to the way one click cluster sizing worked):

  • Cluster Right-Size using Predefined Workloads (annotations, label, etc..) [This might require a small amount of work]
  • Send Cluster Spec to cluster-controller, which will automatically delete/resize nodepools.

This isn't on a schedule, and cluster-turndown doesn't have an implementation for pulling a cluster spec from cluster right-sizing. Either way, this still feels like a cluster-autoscaler solution with safe-evict annotations is the better way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants