-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RemoveTooManyRestarts policy #254
Conversation
/cc @ingvagabund |
/kind feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the test descriptions need to be improved.
continue | ||
} | ||
|
||
glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/evicted/evict
glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts) | ||
success, err := evictions.EvictPod(ds.Client, pod, policyGroupVersion, ds.DryRun) | ||
if !success { | ||
glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/(%#v)/(%v)
as error is a simple string at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but all of our other policies format the errors with (%#v)
so for consistency in error reporting we should probably keep it. (Personally I prefer %+v
for errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. No more against keeping %#v. It's better to change all the occurrences at once.
glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err) | ||
} else { | ||
nodePodCount[node]++ | ||
glog.V(1).Infof("RemovePodsHavingTooManyRestarts Evicted pod: %#v (%#v)", pod.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/(%#v)/(%v)
as error is a simple string at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as #254 (comment)
} | ||
|
||
for _, node := range nodes { | ||
glog.V(1).Infof("RemovePodsHavingTooManyRestarts Processing node: %#v", node.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to prefix the messages with RemovePodsHavingTooManyRestarts
as glog/klog prints the file name which uniquely identifies the strategy. Holds for other lines as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also node.Name is a string, so s/%#v/%v or %q
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is %s fine?
This is also something that's copied in all the other strategies (using %#v for nodename here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more against keeping %#v. It's better to change all the occurrences at once.
i++ | ||
} | ||
|
||
// The following 4 pods won't get evicted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following 3
, not 4
pod := test.BuildTestPod(fmt.Sprintf("pod-%d", i), 100, 0, node.Name) | ||
pod.ObjectMeta.OwnerReferences = test.GetNormalPodOwnerRefList() | ||
|
||
// pod i will has 25 * i restarts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pod at index i has ...
maxPodsToEvict: 0, | ||
}, | ||
{ | ||
description: "One pod have total restarts bigger than threshold, 6 pod evictions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Some pods have total restarts bigger than threshold".
// calcContainerRestarts get container restarts and init container restarts. | ||
func calcContainerRestarts(pod *v1.Pod) (int32, int32) { | ||
var ( | ||
restarts int32 = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for 0
as it's the default value:
var restarts, initRestarts int32
maxPodsToEvict: 0, | ||
}, | ||
{ | ||
description: "Nine pods have total restarts equals threshold(includingInitContainers=true), 5 pods evictions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strategy has the following condition for deciding if the threshold is met:
if strategy.Params.PodsHavingTooManyRestarts.IncludingInitContainers {
if restarts+initRestarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds {
continue
}
} else if restarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds {
continue
}
Which means a pod is evicted only when the number of restarts is bigger than the threshold. Is this the intention? I would assume a pod to be evicted as soon as the threshold is met.
@seanmalloy @ingvagabund thanks, I've addressed your feedback |
@seanmalloy thanks, updated. I'll squash down to 1 commit later today if there's no more feedback |
/lgtm |
squashed to 1 commit |
/lgtm |
/assign @aveshagarwal @ravisantoshgudimetla |
just curious why it includes so many vendor changes. Did it update some existing dependencies. |
2bd5bf5
to
7511ac9
Compare
@seanmalloy thanks, rebased, and got the travis passing. Good for another look now |
/lgtm |
@aveshagarwal and @ravisantoshgudimetla please take a look when you have some time. This PR is ready for you to review again. |
sorry if it has been discussed before. Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes? I am concerned if it does not happen that descheduler might go into loop? |
With some likelihood, yes.
As long as the strategy is run once in a while, it will not. |
@aveshagarwal the strategy itself doesn't explicitly place failed pods onto other nodes, that is still up to the owning controller to recreate the pod and the scheduler to place it. But the goal is that when the scheduler picks up the "stuck" pod again, it will make a better decision this time. If the pod does land on the same node again, it could cause a loop. But like @ingvagabund said this depends on the frequency the descheduler is being run. Ultimately though the pod is already in a CrashLoop anyway, so this will at least make an attempt to un-stick it. |
Actually that likeliood is one of the main goals behind each descheduler strategy and what makes, among other things, descheduler useful. The idea is to have likelihood as high as possible so that descheduler really helps with its balancing act. If likelihood is low then the strategy is not going to be much helpful. And that is my main concern with this strategy. Could we do something to increase the likelihood of this strategy? Right now it seems to me that likelihood is low. Please let me know if you think otherwise.
|
I am aware of that. I am just wondering as I mentioned in my previous comment to Jan that is there anyway to increase the likelihood of success for this strategy. Because if likelihood is low then this strategy might not be much helpful.
|
Some strategies test if there's at least one additional node that can schedule evicted pod (e.g. taints are tolerated, node set pointed by a node selector is not empty, etc.). However, strategies do not keep a state. We might make a cache of pods for a given strategy, check if the pod was evicted from the same node again and "try" to make a better decision. Though, as long as scheduler is not aware of pod's re-incarnation history, it still will be random. @aveshagarwal you could use the same argument for any of already existing strategies. As long as re-incarnated pods do not have additional information for the scheduler, I don't think we can do anything about the likelihood. |
I agree. But I'd say that with other strategies, likelihood is still very high the way they function. If you look at the reasons for toomanyrestart errors, in the issue #62 this PR is trying to address, you could notice that there are cases like application bug etc where this blanket strategy might not be much helpful. I can approve it if you guys would like to go ahead with this PR as it is. I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know. |
I agree. Container restarts count does not say anything about the cause. Also, have consumers decide when the strategy is helpful is not the right justification. We still need to provide a stable solution that will have more positive than negative effects in a cluster. Minimizing disturbance, etc. What can happen in the worst case? Assuming an application has a bug causing the pod to panic (e.g. right away), the pod will enter the crash loop. IINM, kubelet has a mechanism to backoff crashing containers. So, in the worst case the scheduler will schedule the pod again and again. However, the scheduler will not enter an active loop since it backoffs as well. The only drawback I can see now is the pod will never gets to run properly. Though, nor the kubelet, nor the scheduler will enter a downward spiral even when the descheduler period is set to 1s (in which case every strategy will be causing havoc in the cluster anyway). |
I don't think any pod instance has information that can help increase the likelihood. Only kubelet knows the reason why given container failed. Also, we don't consume any third party diagnostics (e.g. node problem detector). All the strategies are quite naive and simple in their decision making. |
thanks @ingvagabund for your explanation. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aveshagarwal, damemi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating
Add RemoveTooManyRestarts policy
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1: The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot). > member for at least 3 months For a couple of years now > Primary reviewer for at least 5 PRs to the codebase kubernetes-sigs#285 kubernetes-sigs#275 kubernetes-sigs#267 kubernetes-sigs#254 kubernetes-sigs#181 > Reviewed or merged at least 20 substantial PRs to the codebase https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund > Knowledgeable about the codebase yes > Sponsored by a subproject approver > With no objections from other approvers > Done through PR to update the OWNERS file this PR > May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot self-nominating
This rebases the changes from #89 to the current master branch
See issue #62