-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting hosts configured to use iptables-nft in kube-proxy #1729
Supporting hosts configured to use iptables-nft in kube-proxy #1729
Comments
If we're going to go to the Go entrypoint eventually anyway, do we believe that it'd be enough work and risk that we shouldn't just do it as the first solution?
this seems right, something like GNU Bash won't add significantly to our CVE load -- total of 17 CVEs since 1999 and only one in the last couple years. The other utils are also older and more stable and won't be subject to security fixes of the frequency of something under morea ctive development. It will however expand the surface of the
Wondering what you think the cons here are? The work required vs a bash script which is well-trodden and straight-forward?
Agreed; this doesn't seem like a good option when we have more stable alternatives. |
I prefer both the short-term and long-term solution proposed over the other solutions. While we are diverged, this allows us to stay in the same direction and sets us up better to handle when the IP Tables ownership change from kubelet to kube-proxy feature goes fully GA with Kubernetes 1.28.
When EKS-A supports 1.25 we should start using this feature gate for sure. But I don't think back-porting this would be enough to solve our issue. Even with this feature gate, kubelet will still create |
I would vote against this. It adds a fair amount of work since we would need to tinker with the default kubeadm installation, possibly having to handle it by ourselves. Also, it adds one more "OS dependent" knob, which is always hard in eks-a given we can't assume full control over the host's image and we can't inspect before the machines are created. It also means we would need to make this option available in our API, since users can have custom images.
As @vignesh-goutham was saying, that doesn't guarantee it will work. Upstream recommends using the same implementation for both rules set from the host and from containers. Although you can technically use both legacy and nft at the same time, they are two different kernel subsystems so the priority/precedence works independently in the two sets of rules. This means that unless you define the rules in both backends with this in mind, we can end up packets go through both chains and end up nowhere (which is what I saw for capd). I suspect that the fact that it worked for <1.24 was just pure luck in the way the two set of rules interacted with each other. And even if we are able to test all the possible network scenarios (also, how many CNI will we test?) to guarantee there isn't pattern that is broken, it could break again in a future k8s release. It seems this can increase the maintenance cost down the road. As for an alternative to the short term solution. What about just documenting that we only support hosts using iptables legacy? And for eks-a, we just ship all images configured with legacy for now (this is assuming all the OS's we build still support both implementations). |
I actually don't think its a huge left to jump to the golang based solution. At the end of the day the wrapper script is not doing too much and I suspect the upstream PR more of less already works. I also dont see going this route as a huge risk, considering right now nft does not work at all. I would support going straight to this option and as I said, I don't think we have to wait for upstream to adopt the golang approach. I agree with other comments around the other options not being very ideal. I would also vote against all of those.
That could certainly be an option. @vignesh-goutham would need to confirm that we could prebake our ubuntu 22.04 and rhel 8.7 images with legacy as the iptables version vs nft which is the default. I can't image we couldnt since thats basically what we do in the container image. |
Yes exactly! |
Background
We introduced our minimal base images when we released Kubernetes 1.22 support. We have since gone back and retrofitted previous versions to ship on the minimal bases. In EKS-D we use two minimal bases,
eks-distro-minimal-base
andeks-distro-minimal-base-iptables
. The iptables variant is only used by the kube-proxy image. We have since used this image elsewhere (the base for the kind image, base for the kindnetd image, EKS Managed kube-proxy addon has a minimal variant based on our iptables image). At the time when we introduced this image, upstream was using debian as the base image for their kube-proxy.A decision we made when creating the iptables minimal base was to avoid including bash and other "coreutils". This meant a divergent between our kube-proxy and upstreams. Upstream's image uses iptabels-wrapper.sh in place of iptables directly to be able to support iptables-legacy and iptables-nft. Based on what it detects the host using, update-alternatives is called within the container to ensure that if the host is using iptables-legacy, the container uses iptables-legacy and the same for iptables-nft. Since we do not include bash and this script in our kube-proxy, our image will only ever use iptables-legacy.
@M00nF1sh and I discussed this back when creating the iptables base image and this issue was logged. We decided that since our primary node images on EKS and EKS-Anywhere were AL2 and Ubuntu 20.04, both of which use iptables-legacy by default, we did not need to handle the case of hosts being configured to use iptables-nft. We did however decide to include iptables-nft in our base image.
Upstream has recently introduced a iptables base image based on distroless which I believe shipped as part of 1.24 or 1.25. They use a similar technique to build this image as we do for some of our minimal base image. They specify the binaries needed for that iptables-wrapper.sh script and using
ldd
determine which libraries also need to be included in the final image. This gives a pretty good best of both worlds where the final image is fairly minimal both in size and cve scope and their wrapper script is able to function to support both legacy and nft iptables.There has been some efforts upstream to go-ify this wrapper script which would remove the need for bash (or dash in the case of upstream's image) and the coreutils from the final image.
Ubuntu 22.04 and RHEL 8.7
We have recently begun testing EKS-A using Ubuntu 22.04 and RHEL 8.7, both of which default iptables to iptables-nft. Through this testing we have run into issues both in CAPD and CAPC clusters which appear to be nft vs legacy related.
An interesting bit is that, at least in terms of RHEL 8.7 on CAPC, 1.23 works totally fine, but when using 1.24 we see the issue. Our best guess, thanks @vignesh-goutham for tracking this down!, is that in 1.24 this was added to the kubelet. Since kubelet runs on the host when it makes that "hint" chain, the kube-proxy container which is still using iptables-legacy is not able to deal with the nft created chain, but since on 1.23 kube-proxy was the only actor interacting with iptables this was not an issue. From what I can gather from the following stackoverflow and stackoverflow, the main difference between nft and legacy is the frontend syntax and the resulting rules are "compatible", this could explain why we dont see issues with 1.23 but we do with 1.24.
Options
Recommended short-term
We modify our kube-proxy-base build to include bash and the other utils upstream includes in their distroless based iptables image. This is a pretty small lift which would look something like this. The downside here is slight bloat to the image and most importantly for us larger cve scope which would require more frequent releases. That said, I do not expect this to significant increase our need for security releases.
Alternative short-term
Documenting that we only support hosts using iptables legacy and for eks-a, we just ship node images (ova/ami/raw/etc) configured with legacy for now (this is assuming all the OS's we build still support both implementations).
Long-term
Get involved with the effort to produce an equivalent golang based version of iptables-wrapper.sh and bake that into our iptables base image instead of relying on bash and the wrapper script. This is something we could rollout in our images ahead of upstream since we have already created a divergent between ours and theres anyway.
Others
Testing
I was able to repro, I think, using the Ubuntu 22.04 ami with the EKS-D kops scripts. The conformance tests for 1.23 still pass and there are a few that fail on 1.24 which pass if I use upstream's kube-proxy image.
The text was updated successfully, but these errors were encountered: