Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using custom ami with containerd as container run time without docker installed not working #961

Closed
ashuec90 opened this issue Jul 4, 2022 · 13 comments

Comments

@ashuec90
Copy link

ashuec90 commented Jul 4, 2022

Environment:

  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.6
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.20
  • AMI Version: Custom ami created where docker-ce and docker-cli removed when using container-runtime as containerd
  • Kernel (e.g. uname -a): 3.10.0-1160.66.1.el7

Hi Team,

I am trying to use ami where docker is not present but containerd is present and containerd is being used as container run time.

So i have made a small change in the install-worker.sh file where after installing the docker https://github.com/awslabs/amazon-eks-ami/blob/v20220629/scripts/install-worker.sh#L121 if the container runtime for containerd is true i am removing the docker-ce and docker-cli as this k8s doc states https://kubernetes.io/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/#remove-docker-engine

When i am using the ami to create worker node and passing the flag --container-runtime containerd while running the bootstrap.sh , the nodes are getting joined to eks cluster, and it goes in ready state too, but the pods which needs to connect to other pods are not coming up and goes in crashloopback, others are coming up which does not need to connect to others.

I can see the below error in /var/log/messages

Jul  4 08:58:58 ip-10-0-0-87 containerd: time="2022-07-04T08:58:58.803607000Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni
 plugin not initialized: failed to load cni config"

Can someone help me on this if i am missing something.

But When i use docker and containerd from the ami and using container runtime as containerd it is working absolutely fine.

This is the same issue i am facing but only when i am removing the docker: #911

@ashuec90 ashuec90 changed the title Using custom ami with containerd as container run time without docker installed. Using custom ami with containerd as container run time without docker installednot working Jul 4, 2022
@ashuec90 ashuec90 changed the title Using custom ami with containerd as container run time without docker installednot working Using custom ami with containerd as container run time without docker installed not working Jul 4, 2022
@sbocinec
Copy link

sbocinec commented Jul 4, 2022

I experience similar issue. Enabling containerd container runtime with any amazon-eks-ami-1.21 newer than v20220526 ends up in the nodes unable to join EKS cluster, with the CNI plugin failing with the error above.

Server: v1.21.13-eks-84b4fe6 or 1.21 platform version eks.8.

Using the same provisioning scripts and amazon-eks-ami-1.21-v20220526 works just fine.

@sbocinec
Copy link

sbocinec commented Jul 4, 2022

@ashuec90 can you please try whether explicitely using v20220526 version of the AMI image works for you as it helped me?

They have updated version of the containerd to 1.4.13-3.amzn2 in the v20220610 version of the image that at least in my tests is causing the issues

* Containerd version upgraded to 1.4.13-3.amzn2 for [CVE-2022-31030](https://alas.aws.amazon.com/cve/html/CVE-2022-31030.html).

@ashuec90
Copy link
Author

ashuec90 commented Jul 4, 2022

Ok Sure @sbocinec, so you mean to say containerd version can be a problem here ? i was using the latest containerd version .

And i can see the errors related to cni plugin But for me worker nodes were joining to the eks cluster, but few pods are not coming up which requires some communication to other pods.

@sbocinec
Copy link

sbocinec commented Jul 4, 2022

According to my own experiments, yes, it might be the problem.

  • amazon-eks-node-1.21-v20220629 AMI with containerd: 1.4.13-3.amzn2 ❌ doesn't work
  • amazon-eks-node-1.21-v20220526 AMI with containerd: 1.4.13-2.amzn2.0.1 ✔️ works

So I'm curious, whether in your specific case with the custom AMI using the older version might fix the issue?

@ashuec90
Copy link
Author

ashuec90 commented Jul 4, 2022

Ok Thanks sbocinec , i will try that.

@sbocinec
Copy link

sbocinec commented Jul 4, 2022

Oh, @ashuec90 please disregard all my previous comments. I found, that in my case the culprit was setting the container runtime in a wrong way - --container-runtime=containerd instead of --container-runtime containerd (I noticed and fixed this together with the AMI change and did not notice it was the argument, not the AMI image that fixed the issues). Sorry for misleading you, hope you can solve your issues.

@ashuec90
Copy link
Author

ashuec90 commented Jul 4, 2022

oh ok. So this is being set while running the bootstrapper.sh. if yes so i am passing this right.
Checking with old version of containerd.

@ashuec90
Copy link
Author

ashuec90 commented Jul 4, 2022

It did not work with containerd 1.4 version as well.

@ashuec90
Copy link
Author

any update on this issue?? we are using eks 1.21 and aws vpc cni 1.11, nodes are able to join the eks cluster , but the pods which are needed to connect to other pod in the cluster is not coming and and giving connection timeeout,

in the /var/log/messages i could see the below error

Sep 20 08:30:36 ip-10-0-0-46 containerd: time="2022-09-20T08:30:36.525282503Z" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Sn apshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] PrivilegedWithout HostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} UntrustedWorkloadRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAn notations:[] Root: Options:map[] PrivilegedWithoutHostDevices:false BaseRuntimeSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0} Runtimes:map[runc:{Type:i o.containerd.runc.v2 Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[SystemdCgroup:true] PrivilegedWithoutHostDevices:false BaseRunti meSpec: NetworkPluginConfDir: NetworkPluginMaxConfNum:0}] NoPivot:false DisableSnapshotAnnotations:true DiscardUnpackedLayers:false IgnoreRdtNotEnabledErrors:f alse} CniConfig:{NetworkPluginBinDir:/opt/cni/bin NetworkPluginConfDir:/etc/cni/net.d NetworkPluginMaxConfNum:1 NetworkPluginConfTemplate: IPPreference:} Regis try:{ConfigPath: Mirrors:map[] Configs:map[] Auths:map[] Headers:map[]} ImageDecryption:{KeyModel:node} DisableTCPService:true StreamServerAddress:127.0.0.1 St reamServerPort:0 StreamIdleTimeout:4h0m0s EnableSelinux:false SelinuxCategoryRange:1024 SandboxImage:602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5 StatsCollectPeriod:10 SystemdCgroup:false EnableTLSStreaming:false X509KeyPairStreaming:{TLSCertFile: TLSKeyFile:} MaxContainerLogLineSize:16384 DisableCgroup :false DisableApparmor:false RestrictOOMScoreAdj:false MaxConcurrentDownloads:3 DisableProcMount:false UnsetSeccompProfile: TolerateMissingHugetlbController:tr ue DisableHugetlbController:true DeviceOwnershipFromSecurityContext:false IgnoreImageDefinedVolumes:false NetNSMountsUnderStateDir:false EnableUnprivilegedPort s:false EnableUnprivilegedICMP:false} ContainerdRootDir:/var/lib/containerd ContainerdEndpoint:/run/containerd/containerd.sock RootDir:/var/lib/containerd/io.c ontainerd.grpc.v1.cri StateDir:/run/containerd/io.containerd.grpc.v1.cri}" Sep 20 08:30:36 ip-10-0-0-46 containerd: time="2022-09-20T08:30:36.525355446Z" level=info msg="Connect containerd service" Sep 20 08:30:36 ip-10-0-0-46 containerd: time="2022-09-20T08:30:36.525412588Z" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.sn apshotter.v1.overlayfs\"" Sep 20 08:30:36 ip-10-0-0-46 containerd: time="2022-09-20T08:30:36.528322628Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni co nfig"

Thanks

@cartermckinnon
Copy link
Member

Update your CNI version to 1.12+ and let us know if that doesn't resolve your issue.

More info: https://github.com/aws/amazon-vpc-cni-k8s#container-runtime

@ashuec90
Copy link
Author

@cartermckinnon it is still not working after taking the CNI version to 1.12+(vpc cni) , also we are using with eks 1.23. it is still not working .

@ashuec90
Copy link
Author

ashuec90 commented May 15, 2023

Can we consider reopening this issue, as we are still seeing this error with latest ami packer code and vpc cni plugin 1.12.6 version.
cc: @cartermckinnon

@cartermckinnon
Copy link
Member

If you're able to reproduce the issue on an official build of the AMI, or with the AMI template from HEAD, feel free to add those details, but this sounds like an issue with your custom AMI. Our 1.25 and 1.26 AMI's don't have Docker installed, and this issue doesn't occur; so you may want to do a diff between those builds and your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants