-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement: fluentd-gcp to get same toleration as kube-proxy #44445
Comments
Perhaps the even more liberal toleration below would be best for all 3, although there's no way to avoid those applications to get scheduled on your node in that case.
|
I assume you mean "When adding a noexecute taint to a node" FYI the node.alpha.kubernetes.io/{notReady,unreachable} tolerations don't do anything in 1.6 unless you have the alpha feature enabled (I forgot the exact flag). But can you explain what behavior you want? You want fluentd and heapster are never evicted due to taints under any circumstances? I think that makes sense for fluentd since it runs on every node but not heapster. |
@davidopp I updated the typo (taint instead of toleration). When filing this ticket I was still under the impression heapster was also a DaemonSet, but realized afterwards that it isn't. So yes, it only applies to fluentd. |
All DaemonSets shoud have infinite toleration, and we actually tried to do that, but failed. The fix is already merged in the head and I'm working on cherry-pick. |
Cherry-pick: #45401 |
I think I'm missing something, but how does #45349 add the |
It's not for all NoExecutes, only for ones that are created by NodeController (i.e. when Node is unresponsive). I don't think it's OK to add general toleration for all NoExecute taints, even user-specified ones. We do want user to be able to set a Taint that will remove all Pods from the given Node, even ones created by Daemons. If you want your particular Daemon to tolerate all NoExecute Taints you can add this toleration to it's spec. I don't think that the system should do it automatically though. @kubernetes/sig-scheduling-feature-requests |
Oh, my understanding of this issue was that currently, in GKE, it's not possible to use any user-defined NoExecute taints because fluentd-gcp gets evicted. fluentd-gcp is from a DaemonSet that users don't own, so we can't add arbitrary tolerations to it, but it is intended to be on every GKE node, just like kube-proxy. |
@nimeshksingh I guess your point is that user-added taints are not usable if users can't adjust the set of default tolerations on system-generated pods, which is the case on GKE. That seems like a valid point. We will eventually have user-configurable admission controller (kubernetes/community#132 will enable this) but that won't be soon. I need to think about what is the best solution for this. For runs-on-every-node system pods we probably need to tolerate all taints by default. For runs-on-one-or-a-few-nodes system pods (like Heapster or DNS) I'm not sure what to do. |
cc/ @mikedanese |
@davidopp I wonder why runs-on-one-or-a-few-nodes system pods would be different than runs-on-every-node system pods, as those few nodes could be any nodes, that means they would have to tolerate |
We certainly need to discuss this, but I think that only problem that's left is dealing with system-pods, that user can't modify (e.g. fluentd on GKE @crassirostris). For this particular case I think we just need to add a 'tolerate all' Toleration to it and call it a day (we probably should do the same thing with kube-proxy and NPD if we didn't already - @bowei @dchen1107 @Random-Liu). For "run one" things (like Heapster or DNS), we can't add them. System depends on them running reasonably well, and Taints will be used as a method for Pod evictions in case of Node problems. I.e. when Node dies you really want your DNS to die and be scheduled somewhere else. Tolerations currently don't support negations IIRC, so we can't specify to tolerate all Taints except ones created by NC. @kubernetes/sig-scheduling-misc |
Automatic merge from submit-queue (batch tested with PRs 45691, 45667, 45698, 45715) Add general NoExecute Toleration to fluentd in gcp configuration Ref #44445 Once merged I'll create a cherry-pick that will be picked up in GKE together with the next patch release. cc @JorritSalverda @davidopp @aveshagarwal @nimeshksingh @piosz ```release-note fluentd will tolerate all NoExecute Taints when run in gcp configuration. ```
re: #44445 (comment) |
I hate to keep this going, but somehow, even after k8s 1.6.6, which has pr #45715 and therefore the NoExecute toleration on fluentd-gcp, I ran into the following behavior:
If I understand the effects correctly, the NoSchedule taint should not affect the fluentd-gcp pod, as it was already running on the node, but it did for some reason. But, given that fluentd should probably come back if it somehow dies anyways, should it have a 'NoSchedule' toleration in addition to the 'NoExecute' toleration? |
Just to clarify, using a NoExecute effect is a reasonable workaround for me, but the current behavior for NoSchedule still strikes me as strange. |
Your understanding is correct. Is the behavior you described repeatable? What PodStatus does the fluentd-gcp pod should when it is evicted? It would useful to see the kubelet and NodeController logs, I guess. Most likely the fluentd-gcp pod is getting evicted for some other reason. (Maybe you could try the same scenario as you described, but don't taint the node, and see if the fluentd-gcp pod gets evicted anyway.)
Yeah, it seems like fluentd-gcp should tolerate all taint effects, not just NoExecute. cc/ @kubernetes/sig-cluster-lifecycle-bugs |
This is actually strange - TaintController isn't even looking at 'NoSchedule' Taints. If adding 'NoSchedule' evicts Daemon, it sounds like DaemonSetController bug. @kubernetes/sig-apps-bugs |
I looked into such problem, and indeed it looks like DSC is deleting daemon:
So it's certainly DaemonSet bug - it shouldn't remove Daemon from the Node even if it doesn't tolerate NoSchedule taint that was put on it. I'll write a quick-fix that add general NoSchedule toleration for fluentd. |
This is a regression in the daemonset controller. See #46577 (comment) |
Assinged @erictune as a @kubernetes/sig-apps-bugs lead, but it's very likely that @mikedanese will fix it before Eric will wake up:) |
Automatic merge from submit-queue (batch tested with PRs 48192, 48182) Add generic NoSchedule toleration to fluentd in gcp config as a quick… …-fix for #44445
Thanks for the debugging, @gmarek ! |
Looks like this is already all figured out, so just let me know if anyone needs me to provide more info. |
Thanks a lot for reporting the problem, @nimeshksingh |
The original issue was resolved along with another that got rolled up into this. What is unresolved:
But these were side conversations to the original issue. If you guys think these are important to continue discussing, can you break out a dedicated issue? |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add wildcard tolerations to kube-proxy - Add wildcard tolerations to kube-proxy. - Add `nvidia.com/gpu` toleration to nvidia-gpu-device-plugin. Related to #55080 and #44445. /kind bug /priority critical-urgent /sig scheduling **Release note**: ```release-note kube-proxy addon tolerates all NoExecute and NoSchedule taints by default. ``` /assign @davidopp @bsalamat @vishh @jiayingz
When adding a noexecute taint to a node most kube-system pods get evicted, except for kube-proxy because it has toleration
=:Exists:NoExecute
.It would be nice if fluentd-gcp and heapster get the same
=:Exists:NoExecute
toleration to make sure they keep running on that node.Currently in GKE 1.6.0 they have the following tolerations:
The text was updated successfully, but these errors were encountered: