Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add namespace filter to nodeutilization #967

Merged

Conversation

knelasevero
Copy link
Contributor

Closes #966

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2022
Comment on lines 300 to 301
WithFilter(podEvictor.PreEvictionFilter).
WithNamespaces(includedNamespaces).
WithoutNamespaces(excludedNamespaces).
BuildFilterFunc()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use these, or do we want to throw this functionality into the PreEvictionFilter directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace is a parameter of the strategies so it needs to stay outside of PreEvictionFilter extension point

@damemi
Copy link
Contributor

damemi commented Sep 28, 2022

We chose not to allow namespace filtering for NodeUtilization because it ultimately only hinders the strategy (see #391 for a longer discussion on it)

tl;dr is that NodeUtilization is designed to balance resource consumption for an entire node. The idea of balanced resources across specific namespaces doesn't have a clear use case, and we can't guarantee it with this strategy anyway. Most of the use cases that seem like they fit are actually better served by a different strategy (like topology spreading).

Just curious is there more detail on where this is coming from for these strategies?

@knelasevero
Copy link
Contributor Author

knelasevero commented Sep 28, 2022

We chose not to allow namespace filtering for NodeUtilization because it ultimately only hinders the strategy (see #391 for a longer discussion on it)

tl;dr is that NodeUtilization is designed to balance resource consumption for an entire node. The idea of balanced resources across specific namespaces doesn't have a clear use case, and we can't guarantee it with this strategy anyway. Most of the use cases that seem like they fit are actually better served by a different strategy (like topology spreading).

Just curious is there more detail on where this is coming from for these strategies?

In the discussion #391 (comment)

I saw this message:

+1 for allowing to set a list of excluded namespaces only yet clearly expressing risks of misusing it. Also stressing the setting applies in eviction phase only.

So, for NodeUtilization all namespaces need to be considered for balancing, and here in this PR we are only deciding not to evict a pod from an excluded namespace right before the eviction call (still considering all namespaces while processing the balancing).

Internally we have a client requesting this because they have some pods in kube-system namespace that not necessarily have system critical priority class, and because of that, they want to enforce exclusion explicitly in the descheduler config. (and would also want to exclude other critical namespaces similar to kube-system)

@damemi
Copy link
Contributor

damemi commented Sep 28, 2022

So, for NodeUtilization all namespaces need to be considered for balancing, and here in this PR we are only deciding not to evict a pod from an excluded namespace right before the eviction call (still considering all namespaces while processing the balancing).

I think this will still create some confusing situations where the descheduler is reporting over/underutilized nodes and not doing anything about it because the namespaces are excluded.

@ingvagabund's point of clearly explaining the risks helps with that, but it won't stop people from misusing it. My concern is that the downsides to this outweigh the potential benefits

@ingvagabund
Copy link
Contributor

ingvagabund commented Sep 28, 2022

I think this will still create some confusing situations where the descheduler is reporting over/underutilized nodes and not doing anything about it because the namespaces are excluded.

The same situation can already occur when the over/underutilized nodes are full of pods with system critical priority class set. Or pods with local storage or pvc.

but it won't stop people from misusing it. My concern is that the downsides to this outweigh the potential benefits

This is one of those design decisions which are not easy to make. We need a mechanism to protect namespaces but to also guarantee sufficient degree of balancing nodes. Two goals going against each other under certain condition.

We might hardcode a protection mechanism calculating whether pods from excluded namespaces occupy "too many" nodes. In which case the strategy would refuse any eviction until the pods are re-distributed.

@knelasevero
Copy link
Contributor Author

We might hardcode a protection mechanism calculating whether pods from excluded namespaces occupy "too many" nodes. In which case the strategy would refuse any eviction until the pods are re-distributed.

Hmm 🤔 I personally think this would be even more confusing.

I think that if you explicitly choose a namespace to be ignored, and you are using a strategy that re-balances pods, you would expect the behavior that we described (where the descheduler is reporting over/underutilized nodes and not doing anything about it because the namespaces are excluded). It is something that if well documented would also be easy to respond to if users get confused (maybe not only docs, but also reporting things in lower log levels of these re-balancing strategies, maybe something like "we found pods that should be evicted to proceed with rebalancing, but they are system-critical or part of a filtered namespace").

@ingvagabund
Copy link
Contributor

ingvagabund commented Oct 10, 2022

@damemi I would like to find an agreement on this. Either get this moving or decide to look for a different approach for namespace inclusion/exclusion. Alternative is to allow to specify only a list of excluded protected namespaces. Users will need to explicitly and willingly enumerate a list of excluded namespaces.

@damemi
Copy link
Contributor

damemi commented Oct 10, 2022

Overall, I'm personally against this because it doesn't have a clear use case and is designed to detriment an existing plugin. I think doing this will encourage users to use lownodeutilization when their use case is actually better fit for something like pod topology spread. Adding a big warning not to use this just adds confusion (why add a feature that we're warning most people not to use?)

Having voiced my concerns, I do see that I'm in the minority on this so it's unfair of me to block it. So, thank you for hearing me out

Alternative is to allow to specify only a list of excluded protected namespaces. Users will need to explicitly and willingly enumerate a list of excluded namespaces.

I would prefer this. I'm thinking most clusters will have a large number of overall namespaces with a smaller fraction of high-priority namespaces which they want to exclude. The alternative, where there is only a small fraction of namespaces to include, is much worse for the strategy's effectiveness. So discouraging that is probably best

@knelasevero
Copy link
Contributor Author

@damemi @ingvagabund do we mean disallowing the Include field of namespaces? Or using another struct for this?

@ingvagabund
Copy link
Contributor

yes, making use of included invalid/forbidden in the preEvictionFilter EP

@damemi
Copy link
Contributor

damemi commented Oct 10, 2022

yes, making use of included invalid/forbidden in the preEvictionFilter EP

to clarify, we are only talking about NodeUtilization strategies right? Others can still use include

If making include available to NodeUtilization is a technical requirement for it to be available to all strategies, then we don't have much choice there besides adding it to this strategy's validation

@ingvagabund
Copy link
Contributor

ingvagabund commented Oct 10, 2022

Any plugin using preEvictionFilter will be affected. Currently only adopted by the node utilization strategies. Unless we accept a new plugin, it will stay this way forever. Also, we have validation to make sure preEvictionFilter will not get used in plugins where it is not expected to be used. When it comes to custom/third party plugins the validation will be relaxed by the descheduling framework unless the custom/third party plugins provide additional validation.

@damemi
Copy link
Contributor

damemi commented Oct 10, 2022

Any plugin using preEvictionFilter will be affected

That's not good, I didn't realize the scope of this. Any plugin should have the option of using either/or Include/Exclude, but they shouldn't be restricted by the framework.

The NodeUtilization plugins themselves should support or disallow namespace inclusion filters. If they're not able to do that, we may need to refactor how preEvictionFilter works

@knelasevero
Copy link
Contributor Author

Can't we use nodeutilization validation (https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/framework/plugins/nodeutilization/validation.go) to disallow this only for this plugin like @damemi suggested? We don't have to change anything on other plugins.

@ingvagabund
Copy link
Contributor

ingvagabund commented Oct 11, 2022

I mean the expectation is to combine the preEvictionFilter with the namespace inclusion/exclusion. Namespace inclusion/exclusion is set by each strategy explicitly. Though, there's no way of detecting whether a custom plugin implementing the preEvictionFilter extension point is performing its own namespace inclusion/exclusion check. As providers of the upstream plugins with default setting and validation for all "approved" plugins we can make sure the namespace inclusion/exclusion is limited for specific strategies. However, we can not guarantee the same quality of protection will be enforced globally for custom plugins.

@damemi
Copy link
Contributor

damemi commented Oct 11, 2022

As providers of the upstream plugins with default setting and validation for all "approved" plugins we can make sure the namespace inclusion/exclusion is limited for specific strategies. However, we can not guarantee the same quality of protection will be enforced globally for custom plugins.

This is totally fine for me, we're the code owners for this plugin, so it's up to us what options to allow. Having Include/Exclude be available for every plugin makes sense (the framework is providing the mechanism to use this feature), but plugins like this one should choose to disallow it.

Do you think that's a reasonable flow for developers?

@knelasevero
Copy link
Contributor Author

I mean the expectation is to combine the preEvictionFilter with the namespace inclusion/exclusion. Namespace inclusion/exclusion is set by each strategy explicitly. Though, there's no way of detecting whether a custom plugin implementing the preEvictionFilter extension point is performing its own namespace inclusion/exclusion check. As providers of the upstream plugins with default setting and validation for all "approved" plugins we can make sure the namespace inclusion/exclusion is limited for specific strategies. However, we can not guarantee the same quality of protection will be enforced globally for custom plugins.

I think that's fine, we can have guidelines for people creating custom plugins, and also we should be able to review/accept new plugins at a "registry" repo like it is done for the kube-scheduler. We just need to make sure this is part of the review process.

and validation for all "approved" plugins we can make sure the namespace inclusion/exclusion is limited for specific strategies

same quality of protection

So, about making sure, and quality of protection, are we talking about the validation.go of those plugins still?

@ingvagabund
Copy link
Contributor

but plugins like this one should choose to disallow it.

A plugin only sees the preEvictionFilter extension point. It does not see what's invoked inside. So the only explicit mechanism for disallowing the node utilization plugins to get the node inclusion is to enforce the DefaultEvictor plugin for the preEvictionFilter extension point during the profile validation. Which is also acceptable.

@ingvagabund
Copy link
Contributor

So, about making sure, and quality of protection, are we talking about the validation.go of those plugins still?

There's no way to validate the behavior which we do not control/own unless we enforce use of specific extension points for specific plugins. Which might be too constraining for users. In which point we can introduce a new descheduler option saying --disable-strict-validation or similar.

@damemi
Copy link
Contributor

damemi commented Oct 11, 2022

Maybe it would be good to see a poc change with that kind of enforcement applied? I'm getting a little confused so seeing the alternative implementation could help inform a better decision. @ingvagabund would you mind throwing together a rough approach to show what you mean?

@knelasevero
Copy link
Contributor Author

I think I got it now. I can provide the implementation that Jan mentions here

is to enforce the DefaultEvictor plugin for the preEvictionFilter extension point during the profile validation

And indeed it can end up being too restricting 🤔. But I will push a commit so we can have a look

@knelasevero
Copy link
Contributor Author

I just noticed that having v1alpha2 and the changes to policyconfig.go of #974 would be kind of necessary here to work on that 😓. Since this is validation at the profile level and not at the plugin level.

@ingvagabund
Copy link
Contributor

@damemi locking a profile for the node utilization strategies. E.g. whenever a profile configures LowNodeUtilization plugin, the validation would make sure preEvictionFilter is always set to enabled = [DefaultEvictor]. To avoid enabling another preEvictionFilter extension point of a different plugin which would include/exclude namespaces.

apiVersion: descheduler/v1alpha2
kind: DeschedulerPolicy
profiles:
- name: ProfileName
  pluginConfig:
  - name: DefaultEvictor
    args:
      ...
  - name: LowNodeUtilization
    args:
      ...
  plugins:
    balance:
      enabled:
      - LowNodeUtilization
    preEvictionFilter:
      enabled:
      - DefaultEvictor

@damemi
Copy link
Contributor

damemi commented Oct 12, 2022

@ingvagabund ah, I see. That seems like a reasonable approach for plugin developers to take imo

Instead of enforcing it against a specific evictor, could it enforce that the evictor's arguments don't use namespace filtering? So you don't get stuck with DefaultEvictor

@knelasevero
Copy link
Contributor Author

knelasevero commented Oct 21, 2022

So, as per latest discussions of the last community meeting, should we merge this one? I added docs and that simple validation.

I wanted to land #974 before merging this one 😢 , but since I am kind of stuck with decoding/conversions there, I wanted to push this one forward already.

@knelasevero knelasevero force-pushed the pre-ev-namespace-filter branch 3 times, most recently from 75783c6 to 4c4663a Compare October 21, 2022 11:52
@knelasevero
Copy link
Contributor Author

/retest

@@ -280,14 +283,30 @@ func evictPods(
continueEviction continueEvictionCond,
) {

var includedNamespaces, excludedNamespaces sets.String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

includedNamespaces can be dropped

README.md Outdated
@@ -243,6 +243,7 @@ actual usage metrics. Implementing metrics-based descheduling is currently TODO
|`thresholdPriority`|int (see [priority filtering](#priority-filtering))|
|`thresholdPriorityClassName`|string (see [priority filtering](#priority-filtering))|
|`nodeFit`|bool (see [node fit filtering](#node-fit-filtering))|
|`evictionNamespaces`|(see [namespace filtering](#namespace-filtering))|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evictableNamespaces

// Naming this one differently since namespaces are still
// considered while considering resoures used by pods
// but then filtered out before eviction
EvictionNamespaces *api.Namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EvictableNamespaces

// Naming this one differently since namespaces are still
// considered while considering resoures used by pods
// but then filtered out before eviction
EvictionNamespaces *api.Namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EvictableNamespaces

Comment on lines 300 to 301
WithFilter(podEvictor.PreEvictionFilter).
WithNamespaces(includedNamespaces).
WithoutNamespaces(excludedNamespaces).
BuildFilterFunc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace is a parameter of the strategies so it needs to stay outside of PreEvictionFilter extension point

@ingvagabund
Copy link
Contributor

Few nits. Looks good in overall.

@knelasevero
Copy link
Contributor Author

The namespace is a parameter of the strategies so it needs to stay outside of PreEvictionFilter extension point

I couldn't understand this one.

@ingvagabund
Copy link
Contributor

Related to

Do we want to use these, or do we want to throw this functionality into the PreEvictionFilter directly?

The former.

@ingvagabund
Copy link
Contributor

@knelasevero there are still few mentions of the evictionNamespaces and #967 (comment). Otherwise lgtm.

@ingvagabund
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

preEvictionFilter also filtering namespaces
4 participants