-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A node lives forever if it failed to join the cluster #1014
Comments
This is a great suggestion and something I'd like to move forward with in our liveness controller. I think we should terminate after a specified period, regardless of status conditions. Further, I think we should allow the user control over how long to wait, or whether or not to terminate.
|
If a kubelet cannot properly connect to the cluster, why would it be able to do so the second time? |
Another thing we can do is look at the tolerations of pods on the node that is not ready. If pods tolerate NotReady for a long time, we shouldn't terminate the node. If pods don't tolerate NotReady, they will be evicted by the pod lifecycle controller, so terminating the node won't harm anything. As @olemarkus mentions, it should be possible to disable this feature. |
Also if I delete a pending pod that was bound to a new node that was stuck - a node will live forever in |
Closing in favor of kubernetes-sigs/karpenter#750 |
Version
Karpenter: v0.5.2
Kubernetes: v1.21.1
Expected Behavior
A node that was stuck in
NotReady
state but not because ofNodeStatusNeverUpdated
will go away afterLivenessTimeout
timeout.Actual Behavior
A node that was stuck in
NotReady
state but not because ofNodeStatusNeverUpdated
will live forever despite the fact that it is emptySteps to Reproduce the Problem
Just use 'bad'
securityGroupSelector
for a node.For example SG without any inbound rules.
In my case if I don't specify
securityGroupSelector
Karpenter chooses 'bad' SG from some AWS ELB that was created byIstio
.This SG has only the following inbound rules:
So because of 'bad' SG the node has the following condition:
So
aws-node
daemonset is not ready andKarpenter
ignores this node after creation because it checks only this condition Reason:So can you consider checking
KubeletNotReady
or something like that in addition toNodeStatusNeverUpdated
?Resource Specs and Logs
The text was updated successfully, but these errors were encountered: