-
Notifications
You must be signed in to change notification settings - Fork 294
Kube node drainer service fails to start. #40
Comments
Easy mistake to slip in.. I think i've spotted another one:
Unless i'm missing something here, the double |
@pieterlange I see that it's working with double |
Related to this, but looks like it's affecting other services too, using In this case I think is better to use something like:
This way we make sure that all the services are running before we start this one. Here is a proposal:
|
@pieterlange replied to you in #41 (comment) about |
@camilb Thanks for your feedback and proposal! I'm not intended to just stick with the
For 1, I believe we can use For 2, a part of issue is resolved thanks to your pr #41. To tackle the other part of issue, I began to believe your proposal, that uses For 3, though I'm rather looking forward with it, I'm not familiar with its use-case! |
Added to the known-issues list https://github.com/coreos/kube-aws/releases/tag/v0.9.1-rc.1 |
Uncordoning will be necessary when the node was restarted due to manual operator intervention instead of a rolling upgrade. Not sure how often we'd see that in practice (who reboots nodes in an ASG?) but it might cause some 'funny' side-effects. |
|
@mumoshu Thanks, I will try it. Hitted the limit several times and was using |
@mumoshu Finished testing.
But works fine with
|
|
Revisiting & thinking about @pieterlange's comment at #40 (comment) and @camilb's comment at #40 (comment). In addition to what you've mentioned, would the uncordon feature + the node drainer allow us to automatically upgrade CoreOS version, which implies automatic rebooting, hopefully without downtime? |
@mumoshu I will close this for now. Still not perfect but works better now. The request to drain the node is properly sent and the pods are started on other nodes. The problem is the containers are quickly killed on the drained node and for some pods that are using bigger images or need a longer time to start/stop, there is not enough time to be started on other nodes. I'm looking for a good method to delay stopping some services on shutdown or reboot. Saw some examples on Redhat and want to test them. I will open another issue with a proposal for improvements soon. |
…m-image to hcom-flavour * commit 'f97674a7765dd91c6fa868cc7f2ded1aac5c1836': KIAMImage should affect server as well as client.
I think the line
Restart=on-failure
fromkube-node-drainer.service
should be removed.Getting these errors:
Failed to restart kubelet.service: Unit kube-node-drainer.service is not loaded properly: Invalid argument.
kube-node-drainer.service: Service has Restart= setting other than no, which isn't allowed for Type=oneshot services. Refusing.
The text was updated successfully, but these errors were encountered: