Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip drain on NotReady Nodes #450

Merged
merged 1 commit into from
Apr 28, 2020

Conversation

zuzzas
Copy link
Contributor

@zuzzas zuzzas commented Apr 21, 2020

What this PR does / why we need it:

Sometimes (mainly due to incorrect requests/limits in our case) a Node's kubelet might become unavailable. Let's skip drain completely on these Nodes.

Which issue(s) this PR fixes:
Fixes #448

Special notes for your reviewer:

This article is of use while trying to understand Node NotReady status. I've come to the conclusion that checking current Node Conditions should be enough,

Release note:

The drain is completely skipped if the node is NotReady for the timeout period. Default timeout is 5 mins. 

@zuzzas zuzzas requested review from ggaurav10 and a team as code owners April 21, 2020 17:09
pkg/controller/drain.go Outdated Show resolved Hide resolved
Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zuzzas for the PR.

Fine with the current approach, not concerned but curious, if we should force-delete machine in case kubelet is only temporarily unavailable and may come back soon. Force-deletion implies the violation of PDB during the roll-out as well.

Can we overwrite the drain-timeout if NotReady is set during drain.? WDYT?

@rfranzke
Copy link
Member

Can we overwrite the drain-timeout if NotReady is set during drain.? WDYT?

Not sure, what would you set it to? If the kubelet is down/not responding then the drain timeout will always elapse, I would assume. Does it makes sense then?

@hardikdr
Copy link
Member

I would have set it to a value equal to health-check timeout[~10mins] or even lesser[~5mins].

The idea would be to consider the possibility:

  • that kubelet might not actually be completely unhealthy and may come back soon., hence give one chance for pods to evict.
  • we do sometimes see node flipping between ready and not-ready, may be due to unknown networking issues.

@rfranzke
Copy link
Member

Hm okay, the second point is valid I guess, thanks for bringing it up. But then even a small timeout might not be good idea, or? What about starting with a very small timeout, ~5min?, and if it elapses reevaluate whether the node meanwhile became ready again? Only if it didn't then terminate it forcefully, otherwise drain again with normal timeout?

pkg/controller/drain.go Outdated Show resolved Hide resolved
@hardikdr
Copy link
Member

Hm okay, the second point is valid I guess, thanks for bringing it up. But then even a small timeout might not be good idea, or? What about starting with a very small timeout, ~5min?, and if it elapses reevaluate whether the node meanwhile became ready again? Only if it didn't then terminate it forcefully, otherwise drain again with normal timeout?

That would be a pretty reliable solution.

@zuzzas
Copy link
Contributor Author

zuzzas commented Apr 22, 2020

Ok, let's try this from this angle. We don't really have to track (time) anything by ourselves since there is a handy lastTransitionTime field in the Conditions.

Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
@hardikdr
Copy link
Member

Ok, let's try this from this angle. We don't really have to track (time) anything by ourselves since > there is a handy lastTransitionTime field in the Conditions.

The approach looks good, I'll take a further look tomorrow. Mainly want to check, what happens if the node has been NotReady for less than 5 mins, and the drain starts. Would it then wait till complete drain-timeout occurs?

@hardikdr
Copy link
Member

Drain-timeout seems to be still effective if the drain starts before 5mins since the node is NotReady. I think there would be more changes and complexity introduced if we target the perfect solution discussed above.

Though, the current solution is valuable for most cases when NotReady nodes will be skipped on the drain.
Should we merge this as a short term solution and keep the related issue open?

WDYS?

@hardikdr hardikdr added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Apr 26, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Apr 26, 2020
@zuzzas
Copy link
Contributor Author

zuzzas commented Apr 26, 2020

@hardikdr
One of the most prominent cases, where this issue has been hitting us is when Node has a lot of Pods with badly configured resource requests scheduled on it. If they collectively allocate lots of memory, kernel's memory management stalls, and Node becomes completely stuck without a chance of recovery. It is imperative that MCM swiftly removes and replaces it with a healthy one.

Let's call this PR "a bandaid" which fixes one the ugliest problems right now. And merge it, of course. :)

@hardikdr hardikdr merged commit 8ec2a7f into gardener:master Apr 28, 2020
@zuzzas zuzzas deleted the skip-drain-if-not-ready branch April 29, 2020 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid draining of NotReady nodes
6 participants