Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemovePodsViolatingNodeTaints policy not working with --descheduling-interval option #245

Closed
nshekhar221 opened this issue Feb 28, 2020 · 9 comments · Fixed by #249
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@nshekhar221
Copy link

nshekhar221 commented Feb 28, 2020

While trying to run Descheduler with RemovePodsViolatingNodeTaints policy and --descheduling-interval option set to 5m , we are observing that descheduler is caching the nodes status/taints etc. at its first run and that cache did not get updated in the subsequent runs.

Due to this, any changes made to nodes taints(after the descheduler's first run) did not get picked up by Descheduler and hence pods are not getting Evicted from that node.

@nshekhar221
Copy link
Author

Some logs for reference -

I0228 06:10:30.925414 1 reflector.go:432] pkg/mod/k8s.io/client-go@v0.17.0/tools/cache/reflector.go:108: Watch close - *v1.Node total 51 items received
I0228 06:10:39.869716 1 reflector.go:278] pkg/mod/k8s.io/client-go@v0.17.0/tools/cache/reflector.go:108: forcing resync
I0228 06:10:53.968840 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000g"
I0228 06:10:53.986100 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000i"
I0228 06:10:54.070078 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000e"
I0228 06:10:54.081994 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000h"
I0228 06:10:54.095838 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000f"
I0228 06:10:54.166821 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000j"
I0228 06:11:54.185717 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000g"
I0228 06:11:54.204834 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000i"
I0228 06:11:54.222188 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000e"
I0228 06:11:54.266629 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000h"
I0228 06:11:54.279198 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000f"
I0228 06:11:54.290507 1 node_taint.go:48] Processing node: "vmss-agent-worker-nshekhartest-tjmrw00000j"

$ kubectl describe node vmss-agent-worker-nshekhartest-tjmrw00000i | grep Taint
Taints: node.kubernetes.io/network-unavailable:NoSchedule

The taint node.kubernetes.io/network-unavailable:NoSchedule is added on the node vmss-agent-worker-nshekhartest-tjmrw00000i after the descheduler is started and as we can see in the logs that change is not getting picked up by Descheduler (in subsequent runs).

@nshekhar221 nshekhar221 changed the title RemovePodsViolatingNodeAffinity policy not working with --descheduling-interval option RemovePodsViolatingNodeTaints policy not working with --descheduling-interval option Feb 28, 2020
@seanmalloy
Copy link
Member

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2020
@damemi
Copy link
Contributor

damemi commented Mar 2, 2020

This looks like a valid bug, when the descheduler starts up we first load the list of nodes which is then passed into the strategy for every loop. I imagine similar bugs can affect the other strategies because of this.

Perhaps it would be better to move the section of code that loads the nodes into the Wait loop to make sure we have a fresh list every time. Or maybe a better option would be to use an informer with pointers to this node list so that it's only updated when it needs to be

@aveshagarwal
Copy link
Contributor

aveshagarwal commented Mar 2, 2020

The reason the list was fetched just once because of initial design where it was supposed to be run just once as a job or cronjob.

Due to introduction of the interval option, the list needs to be fetched inside the loop for each new iteration, so that each iteration is fresh in itself.

We should also make sure that any changes in this regard should not break those scenario where interval option is not being used.

@aveshagarwal
Copy link
Contributor

aveshagarwal commented Mar 2, 2020

Also running descheduler every 5m might be good for experimental purposes, but does not seem very practical. IOWs, it should not be required to balance a cluster every 5m in general or atleast in most cases.

@ingvagabund
Copy link
Contributor

/assign

@nshekhar221
Copy link
Author

Also running descheduler every 5m might be good for experimental purposes, but does not seem very practical. IOWs, it should not be required to balance a cluster every 5m in general or atleast in most cases.

@aveshagarwal I am experimenting a combined setup of Node Problem Detector and Descheduler, where NPD will taint any faulty nodes in the cluster and Descheduler can drain PODs from that faulty node (via RemovePodsViolatingNodeTaints policy).

Increasing time interval between consecutive runs of Descheduler will lead to increase in time of faulty node detection and remediation.

@dharmab
Copy link
Contributor

dharmab commented Jul 18, 2020

Just wanted to post an update that we've gotten the NPD+Descheduler Wombo Combo working in production and it seems to work pretty well. Would it be useful to add this use case to any documentation?

@seanmalloy
Copy link
Member

Just wanted to post an update that we've gotten the NPD+Descheduler Wombo Combo working in production and it seems to work pretty well. Would it be useful to add this use case to any documentation?

@dharmab yes it would be useful to document this real world use case. It would be great if you could submit a PR to update docs/user-guide.md with the details. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants