-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'namespace' parameter support to LowNodeUtilization strategy #391
Comments
@korjek thanks for opening this issue. I'm assuming you are referring to the namespace filtering feature that was just added in the v0.19.0 release. When the namespace filtering enhancement was initially discussed it was determined that it would not make sense to enable this for all descheduler strategies. For example for the Can you provide any additional details on your use case? |
@seanmalloy thank you for your reply. You are right, I meant namespace filtering. What I got from a documentation namespace filtering is enabled for all strategies, but LowNodeUtilization. Another solution that I can think of for our use case: to specify the lowest priority for all pods in that namespace and use thresholdPriorityClassName filtering. But a drawback of doing this: in case we lost nodes and there is no enough capacity to schedule all pods across the cluster, we would like to have pods on this namespace scheduled and some other pods evicted by kube-scheduler. So using the lowest priority in this case is not a good idea. That's why we think it would be great to have namespace filtering support with LowNodeUtilization strategy. |
@korjek are the pods running in this namespace running on a different set of nodes? Could you maybe filter using a node selector? |
@seanmalloy no, they are using the same nodes pool. But what's the reason to do not have namespace filtering for LowNodeUtilization strategy? |
LowNodeUtilization needs to consider as many pods as possible to accurately calculate each node's resource consumption. Only considering a subset of these for eviction would hurt its effectiveness at balancing cluster resource usage. Can you give some more details on what your use case is? Perhaps there is another approach with the scheduler/descheduler that would fit better |
@damemi but why the descheduler needs to calculate node's resource consumptions if that info is available from node's status? How I see this: when LowNodeUtilization strategy generates a list of pods for eviction it should take into account namespace filtering (ie when evaluating pods for eviction, skip pods that are filtered by namespace). The rest of the logic shouldn't be changed. Here is my usecase #391 (comment) |
If you limit the pods that are available for eviction with LowNodeUtilization, the strategy won't work very well. The point of it is to balance entire node usage, so unless the namespace you're restricting it to makes up a large portion of your cluster's consumption then there isn't much point to running this strategy in this namespace. By your use case, I meant why do you specifically want to run LowNodeUtilization in one namespace? I understand that you want to restrict evictions to certain idempotent pods, but are you just trying to keep these pods spread out between nodes? Are they actually a large portion of your node's usage? If we can know your end goal with this strategy, we can better determine if this is a real use case or if something else is more appropriate (like Pod Topology Spreading or RemoveDuplicates) |
IIUC, the specific namespace is like a bank of pods to be sacrificed when certain nodes starts to be overutilized. Assuming what you are asking for is implemented, in case no pod from the specific namespace runs on any of the overutlized nodes, the strategy will have no effect. More or less the same holds if a small fraction of pods from the namespace runs on the overutilized nodes (debatable, yet depending on a particular use case). I can see how it aligns with LowNodeUtilization strategy though unless the bank of pods are concentrated around overutilized nodes, I don't see any other benefit. I can see the case where a set of nodes have a main application (each). Each main application concentrates a bank of workers on the same node (in the specific namespace). In case the worker pods start to eat too much cpu/memory (upper bounded by resource limits), nodes can get overutlized in which case it makes sense to reschedule/evict only some of the worker pods from the bank. @korjek Is this use case close enough? |
@damemi |
But that's actual for PriorityClass filtering too: if all pods except a small amount are using higher PriorityClass than specified using PriorityClass filter, strategy will have no effect.
Almost. The main load (~90%) are caused by pods running in one namespace. So once AWS reclaims node, other nodes become really overloaded. |
This is the key here, and I think it's a legitimate use case. Ultimately there isn't much "risk" with enabling namespace filtering for this strategy, the only risk is that if used improperly it will hurt the strategy's effectiveness (or maybe cause it to hotloop/run every iteration with no results but we could probably get around this by only listing the viable pods when calculating usage). If we do enable this, those outcomes should be clearly noted in the docs that it is not normally recommended to run this strategy with namespace filtering. @seanmalloy @ingvagabund wdyt? |
@damemi I'm fine with this as long as it is clearly documented. |
Semantics of filtering namespaces through |
I'm not sure this requires a new field, it would probably be sufficient to document the difference because it's just for one strategy. |
two once HighNodeUtilization strategy gets implemented |
This strategy is different as it's a strategy for |
Nobody reads a documentation unless it's really needed. I will not argue against reusing |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
I was fooled by this too. I assumed the filter to let me have some namespaces to be untouched by descheduler, but that's clearly not the case. |
I think that adding a top-level setting for excluding namespaces is a valid request. In OpenShift, we currently have our operator manually append system namespaces to each strategy's I also think that excluding namespaces from |
LowNodeUtilization filters through the namespace, which does not affect applications in other namespaces. If the entire cluster is used, production applications will be unavailable. |
+1 for allowing to set a list of excluded namespaces only yet clearly expressing risks of misusing it. Also stressing the setting applies in eviction phase only. The strategy evicts pods in order to move pods from overutilized nodes to underutilized ordering pods from the least priority to the most. In case the list of excluded namespaces is configured improperly, some of the least priority pods might stay and pods with higher priority might get evicted instead. Which still improves resource usage among nodes, though not always respecting the priority. Description of Worth adding another |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@korjek: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@seanmalloy: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hello.
I think that would be great if 'namespace' parameter is supported with LowNodeUtilization strategy.
The text was updated successfully, but these errors were encountered: