-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add namespace filter to nodeutilization #967
add namespace filter to nodeutilization #967
Conversation
WithFilter(podEvictor.PreEvictionFilter). | ||
WithNamespaces(includedNamespaces). | ||
WithoutNamespaces(excludedNamespaces). | ||
BuildFilterFunc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use these, or do we want to throw this functionality into the PreEvictionFilter directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace is a parameter of the strategies so it needs to stay outside of PreEvictionFilter extension point
We chose not to allow namespace filtering for NodeUtilization because it ultimately only hinders the strategy (see #391 for a longer discussion on it) tl;dr is that NodeUtilization is designed to balance resource consumption for an entire node. The idea of balanced resources across specific namespaces doesn't have a clear use case, and we can't guarantee it with this strategy anyway. Most of the use cases that seem like they fit are actually better served by a different strategy (like topology spreading). Just curious is there more detail on where this is coming from for these strategies? |
In the discussion #391 (comment) I saw this message:
So, for NodeUtilization all namespaces need to be considered for balancing, and here in this PR we are only deciding not to evict a pod from an excluded namespace right before the eviction call (still considering all namespaces while processing the balancing). Internally we have a client requesting this because they have some pods in kube-system namespace that not necessarily have system critical priority class, and because of that, they want to enforce exclusion explicitly in the descheduler config. (and would also want to exclude other critical namespaces similar to kube-system) |
I think this will still create some confusing situations where the descheduler is reporting over/underutilized nodes and not doing anything about it because the namespaces are excluded. @ingvagabund's point of clearly explaining the risks helps with that, but it won't stop people from misusing it. My concern is that the downsides to this outweigh the potential benefits |
The same situation can already occur when the over/underutilized nodes are full of pods with system critical priority class set. Or pods with local storage or pvc.
This is one of those design decisions which are not easy to make. We need a mechanism to protect namespaces but to also guarantee sufficient degree of balancing nodes. Two goals going against each other under certain condition. We might hardcode a protection mechanism calculating whether pods from excluded namespaces occupy "too many" nodes. In which case the strategy would refuse any eviction until the pods are re-distributed. |
Hmm 🤔 I personally think this would be even more confusing. I think that if you explicitly choose a namespace to be ignored, and you are using a strategy that re-balances pods, you would expect the behavior that we described (where the descheduler is reporting over/underutilized nodes and not doing anything about it because the namespaces are excluded). It is something that if well documented would also be easy to respond to if users get confused (maybe not only docs, but also reporting things in lower log levels of these re-balancing strategies, maybe something like "we found pods that should be evicted to proceed with rebalancing, but they are system-critical or part of a filtered namespace"). |
@damemi I would like to find an agreement on this. Either get this moving or decide to look for a different approach for namespace inclusion/exclusion. Alternative is to allow to specify only a list of excluded protected namespaces. Users will need to explicitly and willingly enumerate a list of excluded namespaces. |
Overall, I'm personally against this because it doesn't have a clear use case and is designed to detriment an existing plugin. I think doing this will encourage users to use lownodeutilization when their use case is actually better fit for something like pod topology spread. Adding a big warning not to use this just adds confusion (why add a feature that we're warning most people not to use?) Having voiced my concerns, I do see that I'm in the minority on this so it's unfair of me to block it. So, thank you for hearing me out
I would prefer this. I'm thinking most clusters will have a large number of overall namespaces with a smaller fraction of high-priority namespaces which they want to exclude. The alternative, where there is only a small fraction of namespaces to include, is much worse for the strategy's effectiveness. So discouraging that is probably best |
@damemi @ingvagabund do we mean disallowing the Include field of namespaces? Or using another struct for this? |
yes, making use of |
to clarify, we are only talking about NodeUtilization strategies right? Others can still use include If making include available to NodeUtilization is a technical requirement for it to be available to all strategies, then we don't have much choice there besides adding it to this strategy's validation |
Any plugin using preEvictionFilter will be affected. Currently only adopted by the node utilization strategies. Unless we accept a new plugin, it will stay this way forever. Also, we have validation to make sure preEvictionFilter will not get used in plugins where it is not expected to be used. When it comes to custom/third party plugins the validation will be relaxed by the descheduling framework unless the custom/third party plugins provide additional validation. |
That's not good, I didn't realize the scope of this. Any plugin should have the option of using either/or Include/Exclude, but they shouldn't be restricted by the framework. The NodeUtilization plugins themselves should support or disallow namespace inclusion filters. If they're not able to do that, we may need to refactor how preEvictionFilter works |
Can't we use nodeutilization validation (https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/nodeutilization/validation.go) to disallow this only for this plugin like @damemi suggested? We don't have to change anything on other plugins. |
I mean the expectation is to combine the preEvictionFilter with the namespace inclusion/exclusion. Namespace inclusion/exclusion is set by each strategy explicitly. Though, there's no way of detecting whether a custom plugin implementing the preEvictionFilter extension point is performing its own namespace inclusion/exclusion check. As providers of the upstream plugins with default setting and validation for all "approved" plugins we can make sure the namespace inclusion/exclusion is limited for specific strategies. However, we can not guarantee the same quality of protection will be enforced globally for custom plugins. |
This is totally fine for me, we're the code owners for this plugin, so it's up to us what options to allow. Having Include/Exclude be available for every plugin makes sense (the framework is providing the mechanism to use this feature), but plugins like this one should choose to disallow it. Do you think that's a reasonable flow for developers? |
I think that's fine, we can have guidelines for people creating custom plugins, and also we should be able to review/accept new plugins at a "registry" repo like it is done for the kube-scheduler. We just need to make sure this is part of the review process.
So, about |
A plugin only sees the preEvictionFilter extension point. It does not see what's invoked inside. So the only explicit mechanism for disallowing the node utilization plugins to get the node inclusion is to enforce the |
There's no way to validate the behavior which we do not control/own unless we enforce use of specific extension points for specific plugins. Which might be too constraining for users. In which point we can introduce a new descheduler option saying |
Maybe it would be good to see a poc change with that kind of enforcement applied? I'm getting a little confused so seeing the alternative implementation could help inform a better decision. @ingvagabund would you mind throwing together a rough approach to show what you mean? |
I think I got it now. I can provide the implementation that Jan mentions here
And indeed it can end up being too restricting 🤔. But I will push a commit so we can have a look |
I just noticed that having v1alpha2 and the changes to policyconfig.go of #974 would be kind of necessary here to work on that 😓. Since this is validation at the profile level and not at the plugin level. |
@damemi locking a profile for the node utilization strategies. E.g. whenever a profile configures LowNodeUtilization plugin, the validation would make sure apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: ProfileName
pluginConfig:
- name: DefaultEvictor
args:
...
- name: LowNodeUtilization
args:
...
plugins:
balance:
enabled:
- LowNodeUtilization
preEvictionFilter:
enabled:
- DefaultEvictor |
@ingvagabund ah, I see. That seems like a reasonable approach for plugin developers to take imo Instead of enforcing it against a specific evictor, could it enforce that the evictor's arguments don't use namespace filtering? So you don't get stuck with DefaultEvictor |
955fc23
to
cced01d
Compare
So, as per latest discussions of the last community meeting, should we merge this one? I added docs and that simple validation. I wanted to land #974 before merging this one 😢 , but since I am kind of stuck with decoding/conversions there, I wanted to push this one forward already. |
75783c6
to
4c4663a
Compare
/retest |
@@ -280,14 +283,30 @@ func evictPods( | |||
continueEviction continueEvictionCond, | |||
) { | |||
|
|||
var includedNamespaces, excludedNamespaces sets.String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
includedNamespaces
can be dropped
README.md
Outdated
@@ -243,6 +243,7 @@ actual usage metrics. Implementing metrics-based descheduling is currently TODO | |||
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))| | |||
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))| | |||
|`nodeFit`|bool (see [node fit filtering](#node-fit-filtering))| | |||
|`evictionNamespaces`|(see [namespace filtering](#namespace-filtering))| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evictableNamespaces
// Naming this one differently since namespaces are still | ||
// considered while considering resoures used by pods | ||
// but then filtered out before eviction | ||
EvictionNamespaces *api.Namespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EvictableNamespaces
// Naming this one differently since namespaces are still | ||
// considered while considering resoures used by pods | ||
// but then filtered out before eviction | ||
EvictionNamespaces *api.Namespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EvictableNamespaces
WithFilter(podEvictor.PreEvictionFilter). | ||
WithNamespaces(includedNamespaces). | ||
WithoutNamespaces(excludedNamespaces). | ||
BuildFilterFunc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The namespace is a parameter of the strategies so it needs to stay outside of PreEvictionFilter extension point
Few nits. Looks good in overall. |
4c4663a
to
a646d62
Compare
I couldn't understand this one. |
Related to
The former. |
@knelasevero there are still few mentions of the |
a646d62
to
63f7a93
Compare
63f7a93
to
398ffa7
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ingvagabund The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Closes #966