-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containerd support not working #699
Comments
Hi @8enmann,
The problem could be related to cni default mounted path which points to |
Fixed!! - mountPath: /var/run/cri.sock
name: dockershim
...
- hostPath:
path: /run/containerd/containerd.sock I also added a nodeselector for my new daemonset that's a strict complement of the old one so that I could have both node types continue to coexist in the cluster. I added the new label to my ASG template so new nodes would have the new label. |
@8enmann Thanks for sharing this. |
Hi @ravisinha0506 ,
I can confirm that kubelet service and containerd config are configured like that. But do we need to use the dockershim even after moving to containerd? The changes proposed by @8enmann solves the issue. So should they be included in the AMI? Thanks, |
The suggestion from @ravisinha0506 is not to keep using Dockershim, but to keep using the socket named |
@kpanic9 We don't need to use dockershim with the latest ami. We are re-using socket name |
Hi @TBBle and @ravisinha0506 , Thanks you your quick response on the issue.
Further investigation showed that kubelet.service is configured to use /run/dockershim.sock for taking to container runtime. This /run/dockershim.sock socket is not present on the worker node.
Also on new eks worker nodes containerd config file is configured as below,
But if I replace the /run/dockershim.sock in both config files and restart containerd and kubelet, worker node joins the eks cluster. Please let me know I have done anything wrong or what needs to be changed from my side to the workers with containerd can join the cluster without manual intervention. Thanks, |
Hi @kpanic9,
|
Hi @ravisinha0506 , containerd starts without any issues on the nodes failing to join the cluster. Below is the output from 'systemctl status containerd' on one of the nodes.
Also I tried to use the latest eks AMI from AWS (ID: ami-0718ef1c4a20afb11), but it doesn't solve the issue and workers are still failing to join the cluster with same error about message in the kubelet logs. Thanks, |
This is probably the problem. The config.toml applied by Is it possible that containerd was started before bootstrap.sh had run, and so was running without the updated config? If you restart containerd, does it then report the correct |
Looking at the containerd logs, it looks like its using default containerd path:
This can happen if one of tools which you have installed might have dependency on containerd to be up and running. Could you check /var/log/cloud-init-output.log and see if the log timestamp indicates that the containerd was launched prior to bootstrap.sh invocation? Also, could you please share the error log from EKS optimized AMI with |
Hi @ravisinha0506 and @TBBle , You guys were correct. In the userdata containerd was starting before executing the bootstrap script. I have updated the userdata script for worker nodes and after that containerd worker nodes seem to startup with out any issues. Thanks, |
I don't know systemd off-hand, but maybe the |
All the issues discussed in this thread should be fixed with the latest EKS AMI post v20211013 release. Closing this issue. Please feel free to create a new one or re-open if there are any other concerns. |
What happened:
make 1.20
based on Containerd runtime support #698 by @ravisinha0506--container-runtime containerd
I manually patched 3 files in the node, which can be seen here
Summary:
--container-runtime-endpoint "unix:///run/containerd/containerd.sock"
from dockershim/etc/cni/net.d/10-aws.conflist
from a dockerd node, since the directory was empty otherwiseAfter making these changes I restarted the kubelet and everything worked (pod came up and was usable). However when I baked the image with these changes, I got the following error:
When I ssh into the node,
/opt/cni/bin
has all the binaries you'd expect, includingaws-cni
.I ran
/opt/cni/bin/aws-cni-support.sh
and can make the output available privately upon request.Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): "eks.1"aws eks describe-cluster --name <name> --query cluster.version
): 1.20uname -a
):Linux ip-10-0-192-37.ec2.internal 5.4.129-62.227.amzn2.x86_64 #1 SMP Wed Jul 7 00:08:43 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/eks/release
on a node):The text was updated successfully, but these errors were encountered: