-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select workloads to keep alive during turndown #36
Comments
Have definitely heard this theme before... I think it can be broadened to say keep this set of labels, annotations, namespaces, etc. alive |
Yes, absolutely agree with that broader definition. I'd bias towards supporting label-specified workloads/namespaces to start! |
Isn't there a standard "do not evict" label for the cluster autoscaler that we can leverage? |
Interesting ideas! |
I don't think there is a "do not evict" label for autoscaler, but there does exist a "safe-evict" annotation that can be used to tell the autoscaler it can evict if necessary. This use-case doesn't seem like a valid use of turndown, which is designed to shutdown the cluster from being used. It sounds like the behavior they're looking for is: "I want my cluster to scale down to a set of workloads." This can be accomplished by marking safe-evict on workloads they do not mind being downscaled. This doesn't require cluster-turndown at all. |
If we're only talking about clusters without autoscaling nodepools, it does seem like we've already written a lot of the foundational code which does this in the cluster-controller component (similar to the way one click cluster sizing worked):
This isn't on a schedule, and cluster-turndown doesn't have an implementation for pulling a cluster spec from cluster right-sizing. Either way, this still feels like a cluster-autoscaler solution with |
I'm relaying a user request.
They would like to be able to select specific workloads to keep alive (e.g. Kubecost, Prometheus, Grafana) during a turndown.
This behavior is a little complicated to implement, especially in a non-autoscaling environment. We could initially only support this feature in autoscaling environments but I'd need to do some research and testing.
Roadmap positioning of this feature isn't known yet, but I wanted to record it somewhere!
The text was updated successfully, but these errors were encountered: