You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kubectl get pods -nkube-system
NAME READY STATUS RESTARTS AGE
aws-node-snlbt 0/1 CrashLoopBackOff 39 (4m5s ago) 156m
coredns-57ff979f67-87f65 0/1 ContainerCreating 0 153m
coredns-57ff979f67-8tlgg 0/1 ContainerCreating 0 153m
kube-proxy-wxk5n 1/1 Running 1 (21h ago) 21h
aws-node pod logs:
{"level":"info","ts":"2022-11-09T08:51:47.548Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-11-09T08:51:47.549Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-11-09T08:51:47.568Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-11-09T08:51:47.569Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2022-11-09T08:51:49.576Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:51:51.582Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:51:53.588Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:51:55.595Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:51:57.601Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:51:59.607Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:01.614Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:03.621Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:05.627Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:07.633Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:09.640Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:11.647Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:13.653Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:15.659Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:17.665Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:19.672Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:21.679Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:23.685Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:25.691Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:27.698Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:29.704Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:31.711Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:33.717Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:35.724Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:37.730Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:39.737Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:41.743Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:43.750Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:45.756Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:47.763Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:49.769Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:51.776Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:53.782Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:55.789Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:57.795Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:52:59.801Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:53:01.808Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:53:03.815Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:53:05.821Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:53:07.828Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-11-09T08:53:09.834Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
While debugging the node, we observed the 50051 port was listening by the process aws-k8s-agent before rebooting the system After reboot 50051 port is not listening
In the entrypoint.sh It tries to reach the 50051 port, its failing in the absence of this port after timeout
124 # Check for ipamd connectivity on localhost port 50051
125 wait_for_ipam() {
126 while :
127 do
128 if ./grpc-health-probe -addr 127.0.0.1:50051 >/dev/null 2>&1; then
129 return 0
130 fi
131 log_in_json info "Retrying waiting for IPAM-D"
132 done
133 }
170 if ! wait_for_ipam; then
171 log_in_json error "Timed out waiting for IPAM daemon to start:"
172 cat "$AGENT_LOG_PATH" >&2
173 exit 1
174 fi
175
Kindly help us on the correcting right part of the configuration if required
The text was updated successfully, but these errors were encountered:
I see the sock is not reachable. Do you have dockershim symlink pointing to /var/run/containerd/containerd.sock ?
{"level":"info","ts":"2022-11-09T07:10:08.333Z","caller":"ipamd/ipamd.go:509","msg":"Reading ipam state from CRI"}
{"level":"debug","ts":"2022-11-09T07:10:08.333Z","caller":"datastore/data_store.go:374","msg":"Getting running pod sandboxes from \"unix:///var/run/dockershim.sock\""}
Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.
After EKS node reboot we have observed that the aws-node is not coming up (continuously crashing):
We have tried troubleshooting by following this user guide but
https://aws.amazon.com/premiumsupport/knowledge-center/eks-cni-plugin-troubleshooting/
Please find the attached logs
eks_i-0eb7ccae605e1767c_2022-11-09_0712-UTC_0.7.2.tar.gz
Environment:
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.6-eks-7d68063", GitCommit:"f24e667e49fb137336f7b064dba897beed639bad", GitTreeState:"clean", BuildDate:"2022-02-23T19:32:14Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.13-eks-fb459a0", GitCommit:"55bd5d5cb7d32bc35e4e050f536181196fb8c6f7", GitTreeState:"clean", BuildDate:"2022-10-24T20:35:40Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
CNI Version: v1.10.4-eksbuild.1
OS:
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Kernel:
Linux ip-10-0-100-90 5.15.0-1022-aws Kube-Scheduler Support for managing node's available IP addresses #26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Userdata:
#!/bin/bash
set -o xtrace
/etc/eks/bootstrap.sh
sysctl -w vm.nr_hugepages=4000
echo "vm.nr_hugepages=4000" >> /etc/sysctl.conf
echo 4000 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
cp /etc/kubernetes/kubelet/kubelet-config.json /etc/kubernetes/kubelet/kubelet-config.json.back
jq '. += { "cpuManagerPolicy":"static"}' /etc/kubernetes/kubelet/kubelet-config.json.back > /etc/kubernetes/kubelet/kubelet-config.json
jq '. += { "reservedCpus": "0-1"}' /etc/kubernetes/kubelet/kubelet-config.json.back > /etc/kubernetes/kubelet/kubelet-config.json
rm /var/lib/kubelet/cpu_manager_state
systemctl restart snap.kubelet-eks.daemon.service
1.23
m5.4xlarge
default
1 volume(s) - 20 GiB
us-west-2
x86_64
Addons:
While debugging the node, we observed the 50051 port was listening by the process
aws-k8s-agent
before rebooting the systemAfter reboot 50051 port is not listening
In the entrypoint.sh It tries to reach the 50051 port, its failing in the absence of this port after timeout
Kindly help us on the correcting right part of the configuration if required
The text was updated successfully, but these errors were encountered: