-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip drain on NotReady Nodes #450
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zuzzas for the PR.
Fine with the current approach, not concerned but curious, if we should force-delete machine in case kubelet is only temporarily unavailable and may come back soon. Force-deletion implies the violation of PDB during the roll-out as well.
Can we overwrite the drain-timeout if NotReady
is set during drain.? WDYT?
Not sure, what would you set it to? If the kubelet is down/not responding then the drain timeout will always elapse, I would assume. Does it makes sense then? |
I would have set it to a value equal to health-check timeout[~10mins] or even lesser[~5mins]. The idea would be to consider the possibility:
|
Hm okay, the second point is valid I guess, thanks for bringing it up. But then even a small timeout might not be good idea, or? What about starting with a very small timeout, ~5min?, and if it elapses reevaluate whether the node meanwhile became ready again? Only if it didn't then terminate it forcefully, otherwise drain again with normal timeout? |
That would be a pretty reliable solution. |
fad43a9
to
05a4752
Compare
Ok, let's try this from this angle. We don't really have to track (time) anything by ourselves since there is a handy |
Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
05a4752
to
fa7fa31
Compare
The approach looks good, I'll take a further look tomorrow. Mainly want to check, what happens if the node has been |
Drain-timeout seems to be still effective if the drain starts before 5mins since the node is NotReady. I think there would be more changes and complexity introduced if we target the perfect solution discussed above. Though, the current solution is valuable for most cases when NotReady nodes will be skipped on the drain. WDYS? |
@hardikdr Let's call this PR "a bandaid" which fixes one the ugliest problems right now. And merge it, of course. :) |
What this PR does / why we need it:
Sometimes (mainly due to incorrect requests/limits in our case) a Node's kubelet might become unavailable. Let's skip drain completely on these Nodes.
Which issue(s) this PR fixes:
Fixes #448
Special notes for your reviewer:
This article is of use while trying to understand Node NotReady status. I've come to the conclusion that checking current Node Conditions should be enough,
Release note: