-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modifying the containerd
config file at runtime is unreliable
#1628
Comments
Another user has posted some logs here: spegel-org/spegel#217 (comment) |
From those logs:
That is the output from these two lines: amazon-eks-ami/files/bootstrap.sh Lines 632 to 633 in 107df3f
Because So this error:
Is probably referring to one of the dependent services, my guess would be amazon-eks-ami/files/bootstrap.sh Lines 575 to 582 in 107df3f
When you create your custom
And your custom amazon-eks-ami/files/bootstrap.sh Lines 560 to 562 in 107df3f
So this comparison will never succeed: amazon-eks-ami/files/bootstrap.sh Line 575 in 107df3f
Which agrees with that log file. The creation of the Is there a specific AMI release this worked on? |
Anything prior to this one. It has been working for months without issue. To give a specific answer, I have clusters running |
Hm okay, I think I see the problem. We build the official AMI's with the So the line you're looking for ( |
I'll see what we can do to unblock your use case, but in general I don't think we can guarantee that the |
containerd
config file at runtime is unreliable
Thanks for this. I can probably inject the entire config file in the userdata if that's the recommended approach. |
I know there's the risk of configuration sprawl but a flag/argument that could be passed to the bootstrap script to keep the container layers (and thus reverting the behaviour to how it was before the AMI defaulted to discarding them end of last year) would be good too. |
@cartermckinnon @danielloader @tico24 we could look at using
|
I think that using imports would be a great idea. Talos does a pretty good job using imports to customize Containerd configuration. https://www.talos.dev/v1.6/talos-guides/configuration/containerd/ Importing from a directory would allow overrides to easily be written with minimal impact on existing solutions. |
@danielloader What’s the context here? |
Using spegel to do intra node container image sharing. It depends on the layers being retained after pull. |
Interesting, thanks! I think #1630 would cover that case, and my understanding of the merge behavior is that whatever is defined in the imports will override what EKS writes to |
@cartermckinnon yes you are right with your assumption that imports override existing configuration. I will add this to the documentation in Spegel as soon as we get a new AMI release with #1630, that way hopefully we wont get other users raising similar issues. |
Hey guys any-update on the #1630? |
This can now be closed and #1630 is merged and part of a release. |
I updated the config.toml using below command as eks bootstrap script is using this file as the base file.
Once you update this, you can verify this using below command from your worker node.
|
@zaidmnsr please check the thread above, we explicitly recommend against that approach, because it's very fragile |
@cartermckinnon I agree with you that my approach is fragile. The userdata provided at the top of the thread as problem statement will fail as default file in eks ami /etc/containerd/config.toml is all commented . So eventually using sed on that and explicitly setting conatinerd file (export CONTAINERD_CONFIG_FILE=/etc/containerd/config-foo.toml )will fail in joining nodes as all contents are hashed. In the eks bootstrap script, it overwrites the file from location /etc/eks/containerd/containerd-config.toml. So, I decided to go with that approach on directly updating /etc/eks/containerd/containerd-config.toml . |
@zaidmnsr here is an example of how you can override values without using https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks Any toml file that you place in the directory |
Our docs touch on this as well: https://awslabs.github.io/amazon-eks-ami/usage/al2/#customizing-containerd-config |
As containerd patches entire dictionaries, rather than merging keys, we are also opting to go with editing the EKS template config as part of our pre-bootstrap user-data:
While still fragile, this is the best solution we've got until merging of imported dicts is supported: containerd/containerd#5837. We cannot rely on customizing-containerd-config as there are tools (like nvidia-device-plugin) which modify the containerd configuration as well. If merging keys is required, I found success following containerd/containerd#5837 (comment) (note that here we're patching the |
What happened:
When providing a basic userdata configuration to the EKS node on startup, the node fails to start and join the cluster. If I remove the userdata, it works.
In versions prior to amazon-eks-node-1.29-v20240202, this userdata worked flawlessly.
What you expected to happen: I would expect breaking changes to be documented.
How to reproduce it (as minimally and precisely as possible):
Here is my userdata:
This is applied to an EKS node on startup. The node no longer connects to the EKS cluster.
EKS 1.29, although similar reports in this issue on EKS 1.28 - the AMI is the common demoninator here.
Anything else we need to know?:
Environment:
aws eks describe-cluster --name <name> --query cluster.platformVersion
): "eks.1"aws eks describe-cluster --name <name> --query cluster.version
): "1.29"uname -a
): Linux ip-10-205-184-145.us-east-2.compute.internal 5.10.205-195.807.amzn2.aarch64 Template is missing source_ami_id in the variables section #1 SMP Tue Jan 16 18:29:00 UTC 2024 aarch64 aarch64 aarch64 GNU/Linuxcat /etc/eks/release
on a node):(x64 also affected)
The text was updated successfully, but these errors were encountered: