-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter should show Disrupting
or Terminating
through kubectl get nodes
when it has tainted Nodes
#1152
Comments
Also, as part of the alignment effort with Cluster Autoscaler, I imagine whatever change that we suggested should be made in upstream would also apply to Cluster Autoscaler. Perhaps aligning on the taint that we both want to use and proposing a way that that taint could have special logic built around it in the node printer columns is a change that we could try to get into upstream? cc: @MaciekPytel @towca |
Also also, there was a discussion in the K8s Slack over whether Karpenter should be using the
|
kubectl get nodes
when it has tainted NodesDisrupting
or Terminating
through kubectl get nodes
when it has tainted Nodes
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I would like to note that other controllers are relying on the SchedulingDisabled-Taint to see if a node is shutting down or not. The CloudNativePG operator for example has a PDB that disallows deleting the pod. If a node with a CNPG pod receives the SchedulingDisabled-Taint, the operator will start to migrate that pod itself. Since Karpenter does not use the taint, the node is stuck. |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I understand and support the reasons for using the Karpenter specific taint, but it does break other things that look for the standard unschedulable taint. Would it be feasible to add both? So Karpenter would know it initiated the cordon, but also other operators would be aware that the node has been cordoned and they should do what they need to do (failover, in the case of cloudnative-pg). |
I think what we have seen in general is that these other operators support watching on the taint that the autoscaler supports. EBS CSI driver had done something similar since it was also hooking into knowledge that Karpenter was deleting the NodeClaim. In general, I'm a little weary of things hooking into taints that can be added with a |
Description
What problem are you trying to solve?
Currently, Kubernetes uses the
node.kubernetes.io/unschedulable
taint and thespec.unschedulable
field on the node to mark that a node is cordoned and may be about to be drained for maintenance or removal. This is visible through the printer columns that you get when you callkubectl get nodes
like the followingThe code for this handling can be seen in the printer columns logic for
kubectl
here.This is nice visibility for users when Kubernetes is using this specific field; however, nothing is surfaced when Karpenter adds its taint and is actively draining the node since Karpenter doesn't update the
spec.unschedulable
field that the printer relies on to add theSchedulingDisabled
section to the node.It would be a really nice UX if we could add something similar to
SchedulingDisabled
(perhaps something likeDisrupting
orTerminating
) to the node so that users get visibility through the printer that Karpenter is acting on the node.The text was updated successfully, but these errors were encountered: