-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom priority threshold #364
Allow custom priority threshold #364
Conversation
4becf80
to
4b9dd0b
Compare
8d98bc2
to
a18e331
Compare
/cc @ingvagabund @seanmalloy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with this approach is that podEvictor
is shared and passed as a pointer to each strategy, so if you're trying to set thresholds per-strategy you're going to have one strategy updating the field value in the evictor that affects all the others.
So if you want this to be configurable differently for different strategies you need a way that each strategy can pass its own priority check to the evictor. Maybe re-writing EvictPod
to take a series of opts
, similar to ListPodsOnANode
.
We could also consider just adding this as a filter
function to ListPodsOnANode
for each strategy that would filter out any pods above the priority, but I think if we lean too much on filtering pods it may affect the performance of strategies.
In an unrelated note, I do like being able to set a priority class name and have that parsed to a threshold, but I think if we could also allow setting the threshold value directly (and validate that only one is set) that would cover everything
klog.V(1).Infof("Error setting priority threshold, %#v", err) | ||
return | ||
} | ||
klog.V(1).Infof("Running RemoveDuplicatePods with priority threshold %d", priority) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only log the threshold if it's being set to something that's not default.
var priorityClass string | ||
if strategy.Params != nil { | ||
priorityClass = strategy.Params.ThresholdPriorityClassName | ||
} | ||
priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, priorityClass) | ||
if err != nil { | ||
klog.V(1).Infof("Error setting priority threshold, %#v", err) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify this a bit like:
if strategy.Params != nil {
priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, strategy.Params.ThresholdPriorityClassName)
if err != nil {
klog.V(1).Errorf("Error setting priority threshold: %+v", err)
return
}
klog.V(1).Infof("Running LowNodeUtilization with priority threshold %d", priority)
}
Or just pass *strategyParams
to podEvictor.SetPriorityThresholdFromPriorityClass
and do the nil check there (and the log message below this if it's successfully set). That would clean up some of the copy paste in each strategy
priorityClass := strategy.Params.ThresholdPriorityClassName | ||
priority, err := podEvictor.SetPriorityThresholdFromPriorityClass(ctx, client, priorityClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to not pass strategy.Params.ThresholdPriorityClassName
directly to the function?
btw when your PR is ready for review in the future just remove the [WIP] and we'll know it's ready :) |
@damemi I prefer this approach, as At first I've thought of many ways to do this, given we'll set priority threshold every time a strategy starts and our strategies are running one by one, I think it doesn't matter if one strategy may affect others, besides I think we should do all |
I second what @damemi says. |
As
Although a pod has the annotation, if its priority is above priority threshold, it won't be evictable. So if we want this annotation still work for priority filtering, I have some solutions:
I prefer 1, what do you think @damemi @ingvagabund ? |
53a0ac8
to
5eb9832
Compare
I have implemented this
and added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be careful if we start adding params to IsEvictable
because that could grow quickly as new features get added, but for now I'm okay with it. If we hit this again, we might consider refactoring IsEvictable to take a list of options instead
One nit/question, but besides that it's looking good
@@ -112,15 +112,12 @@ func IsCriticalPod(pod *v1.Pod) bool { | |||
if IsMirrorPod(pod) { | |||
return true | |||
} | |||
if pod.Spec.Priority != nil && IsCriticalPodBasedOnPriority(*pod.Spec.Priority) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to stay here. You should never be allowed to evict critical pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this check basically replaced by IsPodEvictableBasedOnPriority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the way it's implemented:
func IsPodEvictableBasedOnPriority(pod *v1.Pod, priority int32) bool {
return pod.Spec.Priority == nil || *pod.Spec.Priority < priority
}
a user can choose any int32 value. Even the one greater than the system critical priority. Thus, if improperly configured the descheduler can evict even the critical pods. return pod.Spec.Priority == nil || (*pod.Spec.Priority < priority && *pod.Spec.Priority < SystemCriticalPriority)
would do a bit better.
My point is IsEvictable
function is supposed to be static. The hardline for deciding when a pod is evictable or not without any degree of flexibility. If IsEvictable
decides a pod is not evictable, it is not. It's a safe guard condition that can not be changed from the outside (except for setting some true/false flags).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, I did notice this change at first too but thought it was just being covered by the new code.
In that case, we should probably keep it the same as it was right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lixiang233 what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, if someone wants to evict system-critical pods, he can add descheduler.alpha.kubernetes.io/evict
annotation. I'll add *pod.Spec.Priority < SystemCriticalPriority
to IsPodEvictableBasedOnPriority
.
370e9a6
to
cc0210a
Compare
cc0210a
to
687711f
Compare
I've moved please take a look @damemi @ingvagabund |
df8401b
to
ae3e637
Compare
I've completed the change mentioned before and added a validation to the threshold. @damemi @ingvagabund |
ae3e637
to
b5e17f9
Compare
@@ -52,15 +52,17 @@ type Namespaces struct { | |||
} | |||
|
|||
// Besides Namespaces only one of its members may be specified | |||
// TODO(jchaloup): move Namespaces to individual strategies once the policy | |||
// version is bumped to v1alpha2 | |||
// TODO(jchaloup): move Namespaces ThresholdPriority and ThresholdPriorityClassName to individual strategies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Missing comma between Namespaces
and ThresholdPriority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work @lixiang233! At this point I think this looks good. This PR brings up some good points for refactoring (like our IsEvictable checks) but those can be discussed elsewhere and aren't blocking for the important changes here.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, lixiang233 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |
…m_priority_threshold Allow custom priority threshold
Fixes #329
Introduce a new configuration for each strategies which allows users to define a PriorityClass which is used by PodEvictor to check if a pod is evictable.
TODO:
StrategyParameters
and read and set it to PodEvictor