DO NOT MERGE: patches runc for al2023 kind image runing on al2 host #2821
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
Background
Our kind image is based on AL23. When we GA'd eks-a it was based on AL2 to ensure we are shipping Amazon built components. AL2 is old and the version of systemd only support cgroup v1. A couple years ago this started to be an issue due to newer OSs switching to cgroup v2, such as Ubuntu 22.04. When users tried to use admin machines based running an OS using cgroup v2, our kind image would stall out due to systemd not supporting v2.
Shortly after AL23 went GA, we switched our image to AL23 solving the v2 issue. This also continued to support older OSs, like AL2 since newer systemd versions still support cgroup v1. Its important to note that we still use AL2 as our e2e tests host machines. That said, its rare in real world usage for customers to be using AL2, since for most use cases they are in infrastructure providers outside of AWS and generally use Ubuntu or RHEL.
About the middle of last year there was the runc/kubelet issue due to runc's version 1.1.6 release. This required new releases of kubernetes and changes to kind to support. At this time we introduced this patch locking the versions of containerd and runc because we saw issues with our kind image on AL2 nodes. While debugging and not finding the answer, we assumed (incorrectly) the issue had to do with the
misc
controller which was the runc 1.1.6 issue. At the time this not make sense because that cgroup controller was not introduced until a later kernel which was not available on AL2. Since then we kept the version of runc to 1.1.5 in our kind image.I was attempting to see what happens if we update to the latest runc today if we have the same issue. We do in fact... I tried reverting the misc controller change, since I was still convinced this was related. That had no affect. I think found this which if you follow the original PR and discussions, it is mentioned that code was originally added to support docker-in-docker workflows... It was also said that this predates the availability of cgroup namespaces.
Around the time this runc change was made, the kind maintainers made the change to require cgroupns=private. When this is in use, the change made to runc is not a problem at all, and likely almost all cases that matter its totally fine, based on the discussion and research done by the runc maintainers.
Kind 0.20.0 is what included the private change and unfortunately when we updated we ran into issues with, surprise surprise, AL2... The version of docker shipped in AL2 supports cgroupns=private, however, something about the kernel does not. See this issue for more info. To work around this, we patch this change out in our kind build.
Not being able to use the private ns and using the 1.1.6+ runc stops us from using AL2 nodes for our capd based clusters. Oddly, using it for bootstrap clusters which do not use cilium, or much for that matter, ends up working fine with the new runc. I assume this is just luck... the issue must have something to do with the kinds of containers or amount of containers being launched.
What do we do?
I dont know... for now we can probably do nothing since our kind image is not meant for production workloads, however, at some point we need to be able to update runc. For that I see a few options
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.