-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-router does not work with iptables 1.8.8 (nf_tables) on host #112477
Comments
@ncopa: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig network |
The rule that gets mangled by older iptables is created here:
|
/assign @danwinship |
Iptables rules must not be added one-by one. Reason is that the update is; read-everything-to-user-space, update-one-rule, write-everything-back-to-kernel. This was one reason to invent nft, so perhaps this is not true for nf_tables mode, but please check it. |
kube-proxy re-creates this rule on purpose so it doesn't matter whether it's doing it one-by-one or via iptables-restore. |
This seems to be a bug in iptables and I don't think we can plausibly work around it. (Changing the version of iptables in the kube-proxy image would just introduce the bug in the opposite scenario, where kube-proxy has the newer version and kubelet has the older version.) The answer for now seems to be "don't use iptables 1.8.8, it's broken". |
Pasting the reply here for visibility
another |
I think the behavior is expected.
It is only a problem if kubelet reads/parses the iptables rules, which I don't think it does. I could not find any
It means, "don't use any iptables newer than whatever is shipped with kube-proxy". |
yeah, based on your comments it seems is the
I think that is more "if you use a containerized kube-proxy, don't use iptables newer if is not compatible with the kube-proxy iptables version inside its containers" and "kubernetes 1.25 releases a kube-proxy image that uses iptables version 1.8.7, that is incompatible with iptables version 1.8.8 and will break your cluster if your versions are not in sync" 🙃 So, it is important to mention that the kube-proxy image generated as part of the release is based on I don't know the commitment we have with the published images of the components, but I don't think that is going to be feasible to support all the combinations existing in the wild (the kube-proxy binary must be 100% compatible though) /sig release |
Yeah, I'm not sure exactly how this is failing for the OP... It seems to me that you should end up with two copies of the |
I'm arguing that the behavior in 1.8.8 is a bug, and that it should presumably be fixed in 1.8.9, after which you will again be able to use any combination of iptables versions 1.8.3-1.8.7 and 1.8.9+. (The iptables maintainers are currently arguing against this, but I'm going to continue arguing against them.) If the behavior in 1.8.8 is not declared a bug, then we probably need to accelerate (and possibly backport) KEP 3178 because the only plausible answer at that point is that we need to make sure there are never any cases where kubelet and kube-proxy look at / modify the same rules. |
@ncopa can you get a copy of " |
/assign @danwinship |
@danwinship I've been working with @ncopa debugging this Here's what I can see with 1.8.8:
Dropping 1.8.7 into the host gets me this:
Looking directly with
Here's the full
|
So belatedly, it occurs to me that neither the iptables kube-proxy nor the ipvs kube-proxy ever refers to the If packets are actually getting dropped, that implies that the kernel representation of the rule is wrong. But from the Also relevant note: this is with the ipvs backend, not the iptables backend. Did you upgrade to 1.25 at the same time as you upgraded from iptables 1.8.7 to iptables 1.8.8? If not, what order did things happen in? (ie, do we know that either "kube 1.24 with ipvs proxy and iptables 1.8.8" or "kube 1.25 with ipvs proxy and iptables 1.8.7" definitely works?) |
I'm not versed enough on nft rules to even start guessing... :) This is with fresh install of 1.25 when the host has 1.8.8 in nftables mode. It works perfectly with legacy mode, we've used that for a good while. Things started to break once we changed our embedded (k0s distro) iptables to nftables mode. We caught this in our smoke test on Alpine and have been testing on various OSes since. |
Can anyone confirm that
works correctly? |
I don't run the kube-proxy container, but start kube-proxy directly on the node with a script. I guess that's not what you want since the problem seems to be host-container version mismatch, but I can test any combination if my setup is ok. |
It works in my env. But I also have kernel linux-5.19.9, and build iptables myself, so sorry I don't think I can make relevant tests. |
I'm not at all convinced that that's what the problem is. My worry was that this had nothing at all to do with iptables and the bug was just "ipvs mode in 1.25 is totally broken (for at least some users)". If it's not that, then my next two theories are "there's a second change in iptables 1.8.8 that is actually causing the problem, and the thing the OP noticed is completely irrelevant" and "iptables 1.8.8 is incompatible with certain kernel versions". The fact that 1.8.7 can't correctly read a certain rule created by 1.8.8 should not be causing any problems, because kube-proxy never looks at that rule. |
🍿 |
I may be completely wrong but it awfully much looks like kubernetes/pkg/proxy/iptables/proxier.go Line 393 in e11e226
and then send it back to kernel with iptables-restore here: kubernetes/pkg/proxy/iptables/proxier.go Line 420 in e11e226
It also looks like
Even if it does not look at that rule it certaily looks that it sends the output of |
That's the But ignoring that, even in the iptables proxier, no rules get copied from the
ah, I know nothing about kube-router, but yes, maybe it might be some component other than kube-proxy that is breaking things... |
@ncopa what k0s version, kube-router image, etc? AFAICT kube-router is used as the CNI in k0s when it is used, deployed with an image, that probably has some other iptables version. Searching kube-router while thinking about:
At a glance, it looks like kube-router does |
OK, I commented more on the kube-router bug (cloudnativelabs/kube-router#1370) suggesting how to fix the problem. I don't think there's anything more we need to be tracking here... FWIW, as a workaround until they fix it, you could try running kubelet with /close |
@danwinship: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@danwinship Maybe it would be worth to add some general docs (not sure if anyone will actually read those though 😄 ) about possible iptables version inconsistencies. Yes, the issue is not triggered by kube-proxy but if iptables provides no "backwards combatibility", I fear this is not the first time someone hits these sort of issues with some networking components. Thanks anyways for looking into this even it revealed to be outside of kube-proxy |
What happened?
Running kubelet on a host with iptables-1.8.8 (nf_tables mode) does not work due to kube-proxy image uses iptables-1.8.7. kube-proxy ends up replace the rule
with
This leads to network stop working.
What did you expect to happen?
Network continue to work regardless of version of iptables installed on the host.
How can we reproduce it (as minimally and precisely as possible)?
Try to join a worker with iptables 1.8.8 in nf_tables mode on the host.
Anything else we need to know?
Problem is that
iptables-save
with iptables 1.8.7 does not work with iptables rules created with iptables 1.8.8 (nf_tables).If I on the host manually (using iptables 1.8.8) do:
It shows the
-m --mark 0x8000/0x8000
.If I then use
nsenter
to thekube-proxy
pod and do the same I get:As you see, the
-m --mark 0x8000/0x8000
is lost and all packages are dropped, not only the marked ones.Possible workarounds:
Possible fixes:
iptables-save | ... | iptables-restore
)Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: