-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting up containerd with eksctl (with preBootstrapCommands) #3572
Comments
Hi @RobertLucian thanks for asking and providing lots of good detail. This does seem like a matter of ordering and what-exists-where-at-what-time, although other vague threads on the interwebz do indicate other "things" 🤷♀️ . I have not tried to do this personally so have no idea 😄 . I will hopefully have time to play around with this soon, I am interested to see what is going on. Anyone else is free to jump in here! In the meantime, according to this AWS thread on making containerd a legit option on AL2/Ubuntu EKS nodes, it seems Bottlerocket was added for this purpose, so if you are interested you could try that. The conversation around making containerd easily usable on AL2/Ubuntu has kind of stalled, so I don't know what the plan is there. |
Hi @Callisto13 thanks a lot for coming back with a reply! This is much appreciated!
With my limited context, yes I'd say it does feel like that would be the case. Hmm, yes, Bottlerocket might be an alternative. Do you know if there are any disadvantages to using Bottlerocket as opposed to going with the AL2/Ubuntu EKS-optimized AMI images? Are GPUs/Inferentia nodes (and workloads) supported? And do you know if their AMIs are available in all regions? Thanks for your response again! And I'll be waiting for an update from you. Let me know if there's anything I can do here. |
Off the top of my head I am not sure, @aclevername did you come across anything while you were working on that volume thing?
From the looks of things this is something they are still working out 😞 .
The list of supported regions can be seen here, coverage seems to be pretty decent. I'll start running some hacky things on this today and see what I come up with 👍 . Can you confirm what AMI or instance type you are using? |
Initial notes after some quick poking around:
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
We need change After making this setting, it worked with
The configuration file of kubelet service has changed since 0.47.0. |
Thanks @cw-sakamoto ! More general info: containerd will be the default runtime in EKS from 1.21, which should be available in July. |
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd I also figured out
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 5 days with no activity. |
Already went through most tickets that talked about the following approach tangentially but couldn't find anything helpful.
High level
version: 0.40.0
cni: 1.7.2
We're looking to use containerd as the container runtime on our clusters. The way we thought of doing that is by adding following commands to the
preBootstrapCommands
section:Spinning up a new cluster will end up timing out because the workers of the said node group are never ready. I can see the nodes in
kubectl get nodes
, but they are marked asNotReady
. Describing them gets me thecontainerd
runtime at least. This is the error message that I get from the describe:Further looking at the pods, this is the list of running pods that I get (notice how they are not live):
Inspecting
aws-node-298mn
further gets me these logs:Describing
aws-node-298mn
gets me this errorWarning FailedCreatePodSandBox 37s (x17 over 4m4s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "60925fddf3daa50de73d593f885102ab3cc707b0e6839eef4c44a363316fa166": add cmd: Error received from AddNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:50051: connect: connection refused"
.Individual node inspection
SSHing into a node at random, I see that the
containerd
andkubelet
services are active. Doing asystemctl cat kubelet
gets me:Looking at the logs of the
kubelet
service usingjournalctl -lfu kubelet
gets me these errors:And then looking at
/etc/cni/net.d
I see that the directory is empty, which I think shouldn't be the case.Interesting reveal
If I don't add those commands to the
preBootstrapCommands
section, then the cluster will get provisioned successfully. SSHing into each instance and making the same exact modifications as those in thepreBootstrapCommands
section, followed bysystemctl daemon-reload && systemctl restart kubelet
will give me a fully functional cluster that's using containerd as the runtime.Does anyone know what's wrong with this picture? Help on this would be greatly appreciated!
The text was updated successfully, but these errors were encountered: