EKS 1.26 Nodes are getting a taint even though they are healty #1446

ardenercelik · 2023-09-29T12:25:38Z

What happened:
We are trying to upgrade from EKS 1.25 to EKS 1.26. After upgrading the ami from amazon-eks-node-1.25-v20230825 to amazon-eks-node-1.26-v20230919 the instances receive a taint even though the nodes seem to be healthy. I also noticed that instance type in the console is empty because some of the labels are not added to the node. We also made sure that IMDS can be reached from the system log. Bootstrap.sh does not throw any errors.

In the attachment you can see some relevant outputs.

Conditions and labels from Node - amazon-eks-node-1.26-v20230919

    "labels": {
      "beta.kubernetes.io/arch": "amd64",
      "beta.kubernetes.io/os": "linux",
      "eks.amazonaws.com/capacityType": "ON_DEMAND",
      "eks.amazonaws.com/nodegroup": "eks-st-managed-backend",
      "eks.amazonaws.com/nodegroup-image": "ami-03825bd685c9d66bb",
      "eks.amazonaws.com/sourceLaunchTemplateId": "lt-099bbf51078483fc8",
      "eks.amazonaws.com/sourceLaunchTemplateVersion": "26",
      "faro.com/backend": "1",
      "faro.com/frontend": "1",
      "faro.com/ingress": "1",
      "faro.com/monitoring": "1",
      "faro.com/worker": "1",
      "k8s.io/cloud-provider-aws": "355d0ea6f0ac02b37d5ad83235a4f0f2",
      "kubernetes.io/arch": "amd64",
      "kubernetes.io/hostname": "ip-10-0-33-238.eu-west-1.compute.internal",
      "kubernetes.io/os": "linux"
    }
---------
"conditions": [
      {
        "type": "MemoryPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:49:00Z",
        "lastTransitionTime": "2023-09-27T12:47:57Z",
        "reason": "KubeletHasSufficientMemory",
        "message": "kubelet has sufficient memory available"
      },
      {
        "type": "DiskPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:49:00Z",
        "lastTransitionTime": "2023-09-27T12:47:57Z",
        "reason": "KubeletHasNoDiskPressure",
        "message": "kubelet has no disk pressure"
      },
      {
        "type": "PIDPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:49:00Z",
        "lastTransitionTime": "2023-09-27T12:47:57Z",
        "reason": "KubeletHasSufficientPID",
        "message": "kubelet has sufficient PID available"
      },
      {
        "type": "Ready",
        "status": "True",
        "lastHeartbeatTime": "2023-09-27T12:49:00Z",
        "lastTransitionTime": "2023-09-27T12:48:14Z",
        "reason": "KubeletReady",
        "message": "kubelet is posting ready status"
      }
    ],

Conditions and labels from Node - amazon-eks-node-1.25-v20230825

    "labels": {
      "beta.kubernetes.io/arch": "amd64",
      "beta.kubernetes.io/instance-type": "t3.medium",
      "beta.kubernetes.io/os": "linux",
      "eks.amazonaws.com/capacityType": "ON_DEMAND",
      "eks.amazonaws.com/nodegroup": "eks-st-managed-backend",
      "eks.amazonaws.com/nodegroup-image": "ami-03ed1b0118ecc804f",
      "eks.amazonaws.com/sourceLaunchTemplateId": "lt-099bbf51078483fc8",
      "eks.amazonaws.com/sourceLaunchTemplateVersion": "25",
      "failure-domain.beta.kubernetes.io/region": "eu-west-1",
      "failure-domain.beta.kubernetes.io/zone": "eu-west-1b",
      "faro.com/backend": "1",
      "faro.com/frontend": "1",
      "faro.com/ingress": "1",
      "faro.com/monitoring": "1",
      "faro.com/worker": "1",
      "k8s.io/cloud-provider-aws": "355d0ea6f0ac02b37d5ad83235a4f0f2",
      "kubernetes.io/arch": "amd64",
      "kubernetes.io/hostname": "ip-10-0-37-208.eu-west-1.compute.internal",
      "kubernetes.io/os": "linux",
      "node.kubernetes.io/instance-type": "t3.medium",
      "topology.ebs.csi.aws.com/zone": "eu-west-1b",
      "topology.kubernetes.io/region": "eu-west-1",
      "topology.kubernetes.io/zone": "eu-west-1b"
    }
----------------
    "conditions": [
      {
        "type": "MemoryPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:50:41Z",
        "lastTransitionTime": "2023-09-27T10:03:53Z",
        "reason": "KubeletHasSufficientMemory",
        "message": "kubelet has sufficient memory available"
      },
      {
        "type": "DiskPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:50:41Z",
        "lastTransitionTime": "2023-09-27T10:03:53Z",
        "reason": "KubeletHasNoDiskPressure",
        "message": "kubelet has no disk pressure"
      },
      {
        "type": "PIDPressure",
        "status": "False",
        "lastHeartbeatTime": "2023-09-27T12:50:41Z",
        "lastTransitionTime": "2023-09-27T10:03:53Z",
        "reason": "KubeletHasSufficientPID",
        "message": "kubelet has sufficient PID available"
      },
      {
        "type": "Ready",
        "status": "True",
        "lastHeartbeatTime": "2023-09-27T12:50:41Z",
        "lastTransitionTime": "2023-09-27T10:04:08Z",
        "reason": "KubeletReady",
        "message": "kubelet is posting ready status"
      }
    ],

How to reproduce it (as minimally and precisely as possible):
Change the 1.25 ami to the 1.26 one.
Anything else we need to know?:

Environment:

AWS Region: us-east-1
Instance Type(s): t3.medium
EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): "eks.7"
Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): "1.25"
AMI Version: amazon-eks-node-1.26-v20230919
Kernel (e.g. uname -a): Linux ip-10-0-32-107.ec2.internal 5.10.192-183.736.amzn2.x86_64 Template is missing source_ami_id in the variables section #1 SMP Wed Sep 6 21:15:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Release information (run cat /etc/eks/release on a node):

BASE_AMI_ID="ami-0963f2c76238b64d5"
BUILD_TIME="Tue Sep 19 17:51:08 UTC 2023"
BUILD_KERNEL="5.10.192-183.736.amzn2.x86_64"
ARCH="x86_64"

eks-command-outputs.txt

ebs-csi-logs.txt

The text was updated successfully, but these errors were encountered:

cartermckinnon · 2023-09-29T15:50:37Z

What taint is being applied?

ardenercelik · 2023-09-29T15:55:05Z

> What taint is being applied?
This is the taint in the console.

   "taints": [
      {
        "key": "node.cloudprovider.kubernetes.io/uninitialized",
        "value": "true",
        "effect": "NoSchedule"
      }
    ]

cartermckinnon · 2023-09-29T19:44:16Z

TLDR: The taint is expected behavior when --cloud-provider=external is used for kubelet.

Some more info in the k8s docs: https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/#running-cloud-controller-manager

In the past, kubelet called cloud-provider APIs directly, and had a bunch of cloud-provider-specific code compiled into it as a result. There's been an effort for many Kubernetes release to remove this logic from kubelet, moving it to a control plane component (cloud-controller-manager) as needed. The kubelet will apply this taint prior to joining the cluster, and cloud-controller-manager will remove it once it fulfills its duties. This happens very quickly in most cases.

Are you seeing that the taint is not removed?

ardenercelik · 2023-10-03T22:39:25Z

Hello, yes even after 24hours the taint does not get removed even though the instances are healthy and can reach the IMDS.

cartermckinnon · 2023-10-03T23:15:34Z

Please open a ticket with AWS Support, we'll have to look into your specific environment. 👍

akshaypatidar1999 · 2024-09-09T13:25:44Z

I was getting the same error. Turns out the issue was with the IAM role permissions. The cluster role did not have DescribeAvailabiltyZones permission

cartermckinnon closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EKS 1.26 Nodes are getting a taint even though they are healty #1446

EKS 1.26 Nodes are getting a taint even though they are healty #1446

ardenercelik commented Sep 29, 2023 •

edited

Loading

cartermckinnon commented Sep 29, 2023

ardenercelik commented Sep 29, 2023 •

edited

Loading

cartermckinnon commented Sep 29, 2023 •

edited

Loading

ardenercelik commented Oct 3, 2023

cartermckinnon commented Oct 3, 2023

akshaypatidar1999 commented Sep 9, 2024

EKS 1.26 Nodes are getting a taint even though they are healty #1446

EKS 1.26 Nodes are getting a taint even though they are healty #1446

Comments

ardenercelik commented Sep 29, 2023 • edited Loading

cartermckinnon commented Sep 29, 2023

ardenercelik commented Sep 29, 2023 • edited Loading

cartermckinnon commented Sep 29, 2023 • edited Loading

ardenercelik commented Oct 3, 2023

cartermckinnon commented Oct 3, 2023

akshaypatidar1999 commented Sep 9, 2024

ardenercelik commented Sep 29, 2023 •

edited

Loading

ardenercelik commented Sep 29, 2023 •

edited

Loading

cartermckinnon commented Sep 29, 2023 •

edited

Loading