Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom priority threshold #364

Conversation

lixiang233
Copy link
Contributor

@lixiang233 lixiang233 commented Jul 29, 2020

Fixes #329
Introduce a new configuration for each strategies which allows users to define a PriorityClass which is used by PodEvictor to check if a pod is evictable.

TODO:

  • Add PriorityClass to StrategyParameters and read and set it to PodEvictor
  • e2e tests for priority class
  • README
  • e2e tests for priority

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 29, 2020
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 29, 2020
@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch 2 times, most recently from 4becf80 to 4b9dd0b Compare July 29, 2020 09:39
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 30, 2020
@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch from 8d98bc2 to a18e331 Compare July 31, 2020 03:43
@lixiang233
Copy link
Contributor Author

/cc @ingvagabund @seanmalloy
/assign @damemi
Please take a look, this pr is ready for review and will be ready for merge once I have updated README.

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with this approach is that podEvictor is shared and passed as a pointer to each strategy, so if you're trying to set thresholds per-strategy you're going to have one strategy updating the field value in the evictor that affects all the others.

So if you want this to be configurable differently for different strategies you need a way that each strategy can pass its own priority check to the evictor. Maybe re-writing EvictPod to take a series of opts, similar to ListPodsOnANode.

We could also consider just adding this as a filter function to ListPodsOnANode for each strategy that would filter out any pods above the priority, but I think if we lean too much on filtering pods it may affect the performance of strategies.

In an unrelated note, I do like being able to set a priority class name and have that parsed to a threshold, but I think if we could also allow setting the threshold value directly (and validate that only one is set) that would cover everything

klog.V(1).Infof("Error setting priority threshold, %#v", err)
return
}
klog.V(1).Infof("Running RemoveDuplicatePods with priority threshold %d", priority)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only log the threshold if it's being set to something that's not default.

Comment on lines 56 to 71
var priorityClass string
if strategy.Params != nil {
priorityClass = strategy.Params.ThresholdPriorityClassName
}
priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, priorityClass)
if err != nil {
klog.V(1).Infof("Error setting priority threshold, %#v", err)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this a bit like:

if strategy.Params != nil {
  priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, strategy.Params.ThresholdPriorityClassName)
  if err != nil {
    klog.V(1).Errorf("Error setting priority threshold: %+v", err)
    return
  }
  klog.V(1).Infof("Running LowNodeUtilization with priority threshold %d", priority)
}

Or just pass *strategyParams to podEvictor.SetPriorityThresholdFromPriorityClass and do the nil check there (and the log message below this if it's successfully set). That would clean up some of the copy paste in each strategy

Comment on lines 52 to 53
priorityClass := strategy.Params.ThresholdPriorityClassName
priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, priorityClass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not pass strategy.Params.ThresholdPriorityClassName directly to the function?

@damemi
Copy link
Contributor

damemi commented Aug 3, 2020

btw when your PR is ready for review in the future just remove the [WIP] and we'll know it's ready :)
We can /hold if you're waiting for pieces like README updates that will depend on feedback
/retitle Allow custom priority threshold
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2020
@k8s-ci-robot k8s-ci-robot changed the title [WIP] Alow custom priority threshold Allow custom priority threshold Aug 3, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 3, 2020
@lixiang233
Copy link
Contributor Author

We could also consider just adding this as a filter function

@damemi I prefer this approach, as IsPodEvictableBasedOnPriority will be called for every pod no matter it's in IsEvictable or the filter function, just moving IsPodEvictableBasedOnPriority out of IsEvictable won't affect the performance. and if we add it to EvictPod, we may do some useless calculations because some pods may not be evictable and we could filter out them before running the strategy.

At first I've thought of many ways to do this, given we'll set priority threshold every time a strategy starts and our strategies are running one by one, I think it doesn't matter if one strategy may affect others, besides I think we should do all IsEvictable checks together in one function, so I choose to make the threshold a param of PodEvictor. That's not a very good implementation, I'll reimplement it using the methods mentioned above and add another configuration to set the threshold value directly.

@ingvagabund
Copy link
Contributor

ingvagabund commented Aug 4, 2020

I second what @damemi says.
For strategies which are not cluster wide (RemovePodsViolatingInterPodAntiAffinity, RemovePodsViolatingNodeAffinity, RemovePodsViolatingNodeTaints, ...), you can extend the filters of ListPodsOnANode function so you don't bother with pods which have higher priority class than ThresholdPriorityClassName. In which case the pod evictor does not have to check the priority class at all.
For the remaining strategies (RemoveDuplicates, LowNodeUtilization), you can always check a pod's class right before you pass the pod to the evictor. So the pod evictor can stay intact.

@lixiang233
Copy link
Contributor Author

As All types of pods with the annotation descheduler.alpha.kubernetes.io/evict are evicted. and this annotation is checked in IsEvictable, if I add the priority check as an individual filter function, and use them like:

podutil.WithFilter(func(pod *v1.Pod) bool {
      return podEvictor.IsEvictable(pod) && isPodEvictableByPriority(pod)
})

Although a pod has the annotation, if its priority is above priority threshold, it won't be evictable.

So if we want this annotation still work for priority filtering, I have some solutions:
1.add priority check to IsEvictable and refactor IsEvictable to have a parameter.
2.check this annotation again in isPodEvictableByPriority
3.make annotation check another filter func and call them like:

podutil.WithFilter(func(pod *v1.Pod) bool {
      return (podEvictor.IsEvictable(pod) && isPodEvictableByPriority(pod)) || isPodEvictableByAnnotation(pod)
})

I prefer 1, what do you think @damemi @ingvagabund ?

@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch from 53a0ac8 to 5eb9832 Compare August 7, 2020 09:47
@lixiang233
Copy link
Contributor Author

I have implemented this

1.add priority check to IsEvictable and refactor IsEvictable to have a parameter.

and added ThresholdPriority to strategy's params.

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful if we start adding params to IsEvictable because that could grow quickly as new features get added, but for now I'm okay with it. If we hit this again, we might consider refactoring IsEvictable to take a list of options instead

One nit/question, but besides that it's looking good

pkg/descheduler/evictions/evictions.go Outdated Show resolved Hide resolved
@@ -112,15 +112,12 @@ func IsCriticalPod(pod *v1.Pod) bool {
if IsMirrorPod(pod) {
return true
}
if pod.Spec.Priority != nil && IsCriticalPodBasedOnPriority(*pod.Spec.Priority) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay here. You should never be allowed to evict critical pods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this check basically replaced by IsPodEvictableBasedOnPriority?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the way it's implemented:

func IsPodEvictableBasedOnPriority(pod *v1.Pod, priority int32) bool {
	return pod.Spec.Priority == nil || *pod.Spec.Priority < priority
}

a user can choose any int32 value. Even the one greater than the system critical priority. Thus, if improperly configured the descheduler can evict even the critical pods. return pod.Spec.Priority == nil || (*pod.Spec.Priority < priority && *pod.Spec.Priority < SystemCriticalPriority) would do a bit better.

My point is IsEvictable function is supposed to be static. The hardline for deciding when a pod is evictable or not without any degree of flexibility. If IsEvictable decides a pod is not evictable, it is not. It's a safe guard condition that can not be changed from the outside (except for setting some true/false flags).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, I did notice this change at first too but thought it was just being covered by the new code.

In that case, we should probably keep it the same as it was right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lixiang233 what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, if someone wants to evict system-critical pods, he can add descheduler.alpha.kubernetes.io/evict annotation. I'll add *pod.Spec.Priority < SystemCriticalPriority to IsPodEvictableBasedOnPriority.

@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch 2 times, most recently from 370e9a6 to cc0210a Compare August 10, 2020 03:13
@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch from cc0210a to 687711f Compare August 10, 2020 03:24
@lixiang233
Copy link
Contributor Author

I've moved IsPodEvictableBasedOnPriority to the evictions package and added e2e testcase for ThresholdPriority.

please take a look @damemi @ingvagabund

@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch from df8401b to ae3e637 Compare August 11, 2020 02:59
@lixiang233
Copy link
Contributor Author

I've completed the change mentioned before and added a validation to the threshold. @damemi @ingvagabund

pkg/utils/priority.go Outdated Show resolved Hide resolved
@lixiang233 lixiang233 force-pushed the ft_allow_custom_priority_threshold branch from ae3e637 to b5e17f9 Compare August 12, 2020 05:58
@@ -52,15 +52,17 @@ type Namespaces struct {
}

// Besides Namespaces only one of its members may be specified
// TODO(jchaloup): move Namespaces to individual strategies once the policy
// version is bumped to v1alpha2
// TODO(jchaloup): move Namespaces ThresholdPriority and ThresholdPriorityClassName to individual strategies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Missing comma between Namespaces and ThresholdPriority

Copy link
Contributor

@damemi damemi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work @lixiang233! At this point I think this looks good. This PR brings up some good points for refactoring (like our IsEvictable checks) but those can be discussed elsewhere and aren't blocking for the important changes here.
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, lixiang233

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2020
@ingvagabund
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 12, 2020
@ingvagabund
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2020
@k8s-ci-robot k8s-ci-robot merged commit 9746fd3 into kubernetes-sigs:master Aug 12, 2020
@lixiang233 lixiang233 deleted the ft_allow_custom_priority_threshold branch August 15, 2020 05:35
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
…m_priority_threshold

Allow custom priority threshold
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Ignoring Pods For Eviction
5 participants