Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RemoveTooManyRestarts policy #254

Merged
merged 2 commits into from
Apr 27, 2020

Conversation

damemi
Copy link
Contributor

@damemi damemi commented Mar 30, 2020

This rebases the changes from #89 to the current master branch

See issue #62

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 30, 2020
@damemi
Copy link
Contributor Author

damemi commented Mar 30, 2020

/cc @ingvagabund
ptal

@seanmalloy
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 31, 2020
pkg/descheduler/strategies/toomanyrestarts.go Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
examples/policy.yaml Outdated Show resolved Hide resolved
pkg/api/types.go Outdated Show resolved Hide resolved
pkg/api/v1alpha1/types.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the test descriptions need to be improved.

continue
}

glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/evicted/evict

pkg/descheduler/strategies/toomanyrestarts.go Outdated Show resolved Hide resolved
glog.V(1).Infof("RemovePodsHavingTooManyRestarts will evicted pod: %#v, container restarts: %d, initContainer restarts: %d", pod.Name, restarts, initRestarts)
success, err := evictions.EvictPod(ds.Client, pod, policyGroupVersion, ds.DryRun)
if !success {
glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/(%#v)/(%v) as error is a simple string at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but all of our other policies format the errors with (%#v) so for consistency in error reporting we should probably keep it. (Personally I prefer %+v for errors)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. No more against keeping %#v. It's better to change all the occurrences at once.

glog.Infof("RemovePodsHavingTooManyRestarts Error when evicting pod: %#v (%#v)", pod.Name, err)
} else {
nodePodCount[node]++
glog.V(1).Infof("RemovePodsHavingTooManyRestarts Evicted pod: %#v (%#v)", pod.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/(%#v)/(%v) as error is a simple string at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #254 (comment)

}

for _, node := range nodes {
glog.V(1).Infof("RemovePodsHavingTooManyRestarts Processing node: %#v", node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to prefix the messages with RemovePodsHavingTooManyRestarts as glog/klog prints the file name which uniquely identifies the strategy. Holds for other lines as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also node.Name is a string, so s/%#v/%v or %q

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is %s fine?

This is also something that's copied in all the other strategies (using %#v for nodename here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more against keeping %#v. It's better to change all the occurrences at once.

i++
}

// The following 4 pods won't get evicted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following 3, not 4

pod := test.BuildTestPod(fmt.Sprintf("pod-%d", i), 100, 0, node.Name)
pod.ObjectMeta.OwnerReferences = test.GetNormalPodOwnerRefList()

// pod i will has 25 * i restarts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod at index i has ...

maxPodsToEvict: 0,
},
{
description: "One pod have total restarts bigger than threshold, 6 pod evictions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Some pods have total restarts bigger than threshold".

// calcContainerRestarts get container restarts and init container restarts.
func calcContainerRestarts(pod *v1.Pod) (int32, int32) {
var (
restarts int32 = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for 0 as it's the default value:

var restarts, initRestarts int32

maxPodsToEvict: 0,
},
{
description: "Nine pods have total restarts equals threshold(includingInitContainers=true), 5 pods evictions",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy has the following condition for deciding if the threshold is met:

if strategy.Params.PodsHavingTooManyRestarts.IncludingInitContainers {
	if restarts+initRestarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds {
		continue
	}
} else if restarts <= strategy.Params.PodsHavingTooManyRestarts.PodeRestartThresholds {
	continue
}

Which means a pod is evicted only when the number of restarts is bigger than the threshold. Is this the intention? I would assume a pod to be evicted as soon as the threshold is met.

@damemi
Copy link
Contributor Author

damemi commented Mar 31, 2020

@seanmalloy @ingvagabund thanks, I've addressed your feedback

README.md Outdated Show resolved Hide resolved
pkg/descheduler/strategies/toomanyrestarts.go Outdated Show resolved Hide resolved
@damemi
Copy link
Contributor Author

damemi commented Apr 1, 2020

@seanmalloy thanks, updated. I'll squash down to 1 commit later today if there's no more feedback

@seanmalloy
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020
@damemi
Copy link
Contributor Author

damemi commented Apr 1, 2020

squashed to 1 commit

@seanmalloy
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2020
@seanmalloy
Copy link
Member

/assign @aveshagarwal @ravisantoshgudimetla

@aveshagarwal
Copy link
Contributor

aveshagarwal commented Apr 1, 2020

just curious why it includes so many vendor changes. Did it update some existing dependencies.
Also please split the commit into 2 commits, one with main changes and other related to vendor changes as it helps with the review.

@damemi damemi force-pushed the toomanyrestarts branch 3 times, most recently from 2bd5bf5 to 7511ac9 Compare April 24, 2020 14:22
@damemi
Copy link
Contributor Author

damemi commented Apr 24, 2020

@seanmalloy thanks, rebased, and got the travis passing. Good for another look now

@seanmalloy
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2020
@seanmalloy
Copy link
Member

@aveshagarwal and @ravisantoshgudimetla please take a look when you have some time. This PR is ready for you to review again.

@aveshagarwal
Copy link
Contributor

sorry if it has been discussed before. Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes? I am concerned if it does not happen that descheduler might go into loop?

@ingvagabund
Copy link
Contributor

ingvagabund commented Apr 27, 2020

Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes?

With some likelihood, yes.

I am concerned if it does not happen that descheduler might go into loop?

As long as the strategy is run once in a while, it will not.

@damemi
Copy link
Contributor Author

damemi commented Apr 27, 2020

@aveshagarwal the strategy itself doesn't explicitly place failed pods onto other nodes, that is still up to the owning controller to recreate the pod and the scheduler to place it. But the goal is that when the scheduler picks up the "stuck" pod again, it will make a better decision this time.

If the pod does land on the same node again, it could cause a loop. But like @ingvagabund said this depends on the frequency the descheduler is being run. Ultimately though the pod is already in a CrashLoop anyway, so this will at least make an attempt to un-stick it.

@aveshagarwal
Copy link
Contributor

Is the goal with this strategy that the evicted pods that had too many restarts would be placed on some other nodes?

With some likelihood, yes.

Actually that likeliood is one of the main goals behind each descheduler strategy and what makes, among other things, descheduler useful. The idea is to have likelihood as high as possible so that descheduler really helps with its balancing act. If likelihood is low then the strategy is not going to be much helpful. And that is my main concern with this strategy.

Could we do something to increase the likelihood of this strategy? Right now it seems to me that likelihood is low. Please let me know if you think otherwise.

I am concerned if it does not happen that descheduler might go into loop?

As long as the strategy is run once in a while, it will not.

@aveshagarwal
Copy link
Contributor

aveshagarwal commented Apr 27, 2020

@aveshagarwal the strategy itself doesn't explicitly place failed pods onto other nodes, that is still up to the owning controller to recreate the pod and the scheduler to place it.

I am aware of that. I am just wondering as I mentioned in my previous comment to Jan that is there anyway to increase the likelihood of success for this strategy. Because if likelihood is low then this strategy might not be much helpful.

But the goal is that when the scheduler picks up the "stuck" pod again, it will make a better decision this time.

If the pod does land on the same node again, it could cause a loop. But like @ingvagabund said this depends on the frequency the descheduler is being run. Ultimately though the pod is already in a CrashLoop anyway, so this will at least make an attempt to un-stick it.

@ingvagabund
Copy link
Contributor

ingvagabund commented Apr 27, 2020

Actually that likeliood is one of the main goals behind each descheduler strategy and what makes, among other things, descheduler useful. The idea is to have likelihood as high as possible so that descheduler really helps with its balancing act. If likelihood is low then the strategy is not going to be much helpful. And that is my main concern with this strategy.

Some strategies test if there's at least one additional node that can schedule evicted pod (e.g. taints are tolerated, node set pointed by a node selector is not empty, etc.). However, strategies do not keep a state. We might make a cache of pods for a given strategy, check if the pod was evicted from the same node again and "try" to make a better decision. Though, as long as scheduler is not aware of pod's re-incarnation history, it still will be random.

@aveshagarwal you could use the same argument for any of already existing strategies. As long as re-incarnated pods do not have additional information for the scheduler, I don't think we can do anything about the likelihood.

@aveshagarwal
Copy link
Contributor

@aveshagarwal you could use the same argument for any of already existing strategies. As long as re-incarnated pods do not have additional information for the scheduler, I don't think we can do anything about the likelihood.

I agree. But I'd say that with other strategies, likelihood is still very high the way they function.

If you look at the reasons for toomanyrestart errors, in the issue #62 this PR is trying to address, you could notice that there are cases like application bug etc where this blanket strategy might not be much helpful.

I can approve it if you guys would like to go ahead with this PR as it is. I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

@ingvagabund
Copy link
Contributor

If you look at the reasons for toomanyrestart errors, in the issue #62 this PR is trying to address, you could notice that there are cases like application bug etc where this blanket strategy might not be much helpful.

I agree. Container restarts count does not say anything about the cause. Also, have consumers decide when the strategy is helpful is not the right justification. We still need to provide a stable solution that will have more positive than negative effects in a cluster. Minimizing disturbance, etc. What can happen in the worst case? Assuming an application has a bug causing the pod to panic (e.g. right away), the pod will enter the crash loop. IINM, kubelet has a mechanism to backoff crashing containers. So, in the worst case the scheduler will schedule the pod again and again. However, the scheduler will not enter an active loop since it backoffs as well. The only drawback I can see now is the pod will never gets to run properly. Though, nor the kubelet, nor the scheduler will enter a downward spiral even when the descheduler period is set to 1s (in which case every strategy will be causing havoc in the cluster anyway).

@ingvagabund
Copy link
Contributor

I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

I don't think any pod instance has information that can help increase the likelihood. Only kubelet knows the reason why given container failed. Also, we don't consume any third party diagnostics (e.g. node problem detector). All the strategies are quite naive and simple in their decision making.

@aveshagarwal
Copy link
Contributor

I really believe that it would be more helpful to target those cases where success of this strategy could be higher. Please let me know.

I don't think any pod instance has information that can help increase the likelihood. Only kubelet knows the reason why given container failed. Also, we don't consume any third party diagnostics (e.g. node problem detector). All the strategies are quite naive and simple in their decision making.

thanks @ingvagabund for your explanation.
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aveshagarwal, damemi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2020
@k8s-ci-robot k8s-ci-robot merged commit 91de471 into kubernetes-sigs:master Apr 27, 2020
@seanmalloy seanmalloy mentioned this pull request May 20, 2020
4 tasks
ingvagabund added a commit to ingvagabund/descheduler that referenced this pull request Jun 4, 2020
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1:

The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot).

> member for at least 3 months

For a couple of years now

> Primary reviewer for at least 5 PRs to the codebase

kubernetes-sigs#285
kubernetes-sigs#275
kubernetes-sigs#267
kubernetes-sigs#254
kubernetes-sigs#181

> Reviewed or merged at least 20 substantial PRs to the codebase

https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund

> Knowledgeable about the codebase

yes

> Sponsored by a subproject approver
> With no objections from other approvers
> Done through PR to update the OWNERS file

this PR

> May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot

self-nominating
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
briend pushed a commit to briend/descheduler that referenced this pull request Feb 11, 2022
Based on https://github.com/kubernetes/community/blob/master/community-membership.md#requirements-1:

The following apply to the part of codebase for which one would be a reviewer in an OWNERS file (for repos using the bot).

> member for at least 3 months

For a couple of years now

> Primary reviewer for at least 5 PRs to the codebase

kubernetes-sigs#285
kubernetes-sigs#275
kubernetes-sigs#267
kubernetes-sigs#254
kubernetes-sigs#181

> Reviewed or merged at least 20 substantial PRs to the codebase

https://github.com/kubernetes-sigs/descheduler/pulls?q=is%3Apr+is%3Aclosed+assignee%3Aingvagabund

> Knowledgeable about the codebase

yes

> Sponsored by a subproject approver
> With no objections from other approvers
> Done through PR to update the OWNERS file

this PR

> May either self-nominate, be nominated by an approver in this subproject, or be nominated by a robot

self-nominating
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants