-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico networking broken when host OS uses iptables >= 1.8 #2322
Comments
Steps to reproduceGet a Debian Buster machine. All this is going to be done on a single machine, so physical vs. virtual, specific platform etc. are irrelevant. Install k8s prereqs:
Add docker and k8s repos and install docker and kubeadm:
Bootstrap a single-node k8s cluster with kubeadm, and remove the master taint:
Install Calico:
Wait a bit for Calico to start up and other pods to schedule. Notice that coredns is crashlooping. This is a symptom of network connectivity problems, because in coredns 1.2.2 the loop-detection plugin interprets "I can't talk to my upstream DNS server" as a fatal problem, and crashes. Start 2 playground pods:
Run Try to ping Back on the host node, use
Run tcpdump again, specifically on the interface that links to the Run tcpdump again on the interface that links to Add an iptables trace rule:
Now, the trick here is that
Note that it's not traversing any calico rulesets, because they're getting programmed in the legacy iptables universe, so we only see Docker rules in this trace. Install a trace with
This trace gets dumped into dmesg:
I'll skip the blow-by-blow, but the last line shows we matched item 6 in chain cali-FORWARD, which is an ACCEPT for packets with mark 0x10000. So, both of these traces are ending with an ACCEPT... But somehow between the end of that ACCEPT and the transmission itself, the packet is getting dropped, because we don't see it getting transmitted in tcpdump. More net-tools outputs for my own test case:
Looks like normal Calico routing is getting programmed just fine.
Nothing weird in policy routing.
Note, no neighbor entry for 192.168.0.6, which indicates that the kernel never reached the point of "ok, I need to send to 192.168.0.6 on this interface, I need to do ARP resolution", the packets are getting dropped before that point. rp_filter=1 by default on this OS (aka strict reverse path filtering). Even though it shouldn't affect these packets (the reverse path is 100% correct even for strict mode), I turned rp_filter off and observed no change. That basically exhausted my network debugging knowledge on linux, so at that point I started varying k8s versions, calico versions, switching out calico for weave... And eventually found that downgrading to debian stable (with good old iptables 1.6) completely fixes things. |
Oh, and left out an important one, if we're thinking kernel weirdness: in all of the above, kernel is |
It's definitely iptables 1.8. After some friendly inspiration (hi @bradfitz!), I grabbed the iptables 1.6 package out of debian stable, and hackily overwrote all the binaries on the host OS so that the host node ends up using the same version of iptables as all the containers (1.6 from debian stable)
I did a reboot to fully clear all the system state and go from a clean slate. After the reboot finishes and k8s comes back up, coredns is no longer crashlooping (meaning the pod is able to reach the upstream internet DNS resolver), and I can ping pod-to-pod just fine. So, the root cause definitely seems to be mixing iptables 1.6 and iptables 1.8 against the same kernel. If you use all iptables 1.6, everything is fine. I'm guessing if you use only iptables 1.8 (which translates into nftables but faithfully emulates the userspace interfaces), everything would also work fine. But with the host OS using iptables 1.8 (which programs nftables) and containers like calico-node using iptables 1.6 (which programs legacy iptables), packet forwarding seems to break. Given that, my guess as to a fix would be for calico-node to have both versions of iptables available, and pick which one to use based on what the host OS is doing, somehow (e.g. check via netlink if nftables are non-empty?). Either that, or spin separate containers and document that users have to be careful with which one they use. |
@danderson Thanks for your report and analysis, which we think is spot on. We're hoping to be able to address this within the next couple of Calico releases, but can't promise that for sure yet. |
We have also run into this issue when upgrading our kubernetes nodes to Debian Buster with iptables 1.8 We were able to get around this issue by using
|
We're including support in Calico v3.8.1+ which will allow Calico to run on hosts which use iptables in NFT mode. Setting the |
Thanks @caseydavenport. Do you mean that v3.8.1+ will automatically include support for NFT mode? |
v3.8.1 requires that you set |
Just a note for somebody who got the same problem and switched to iptables-legacy but still doesn't help: Don't forget to check the rules that docker created by iptables-nft, it's still there if you don't manage it by yourself. |
This just solved two days of debugging, when we tried to install Rancher on a Debian Buster cluster. Since Google did not provide any matches on the error we got i will paste the error message below so that others googling this issue will find this thread:
Thanks, @mrak |
Just spent 3 days debugging this issue as I'm using Debian Buster. Could we add a note to https://docs.projectcalico.org/getting-started/kubernetes/self-managed-public-cloud/gce to highlight this nuance. I believe that would a lot of people some debugging time. |
sorry i'm very new to this, can anyone tell me howto / whereis to set the FELIX_IPTABLESBACKEND=NFT for the inbound/outbound traffic of pod network ? the step provided is really appreciated. |
It's noted here https://docs.projectcalico.org/reference/felix/configuration Depend on how are you deploying calico-node. But i guess the most common way is a k8s daemon, just set the ENV like this
I've never tried it before as we switch back to legacy mode instead of nftables :D |
i just use the standard K8s on kubernetes.io 1.18.6 the current one on git repo. everything else is from the guideline installation, nothing special i would love to switch back to liptables legacy as well, but in Centos 8 there is no legacy any more, how would we have the legacy back ? |
Maybe you misunderstood my comment. I mean we should set that ENV for calico-node pod. We're deploying calico-node node as a daemonset. An example of the ENV values looks like this
If you can provide your calico version and how are you deploying here, it's more useful for debugging. Maybe the latest calico version can detect the iptables mode already. I'm not so sure about that. About reverting to "legacy" mode, it's actually pointing |
HI tungdam i'm sorry for not replying soon, thank you for your help This is my execting when building the fresh new K8s : I'm done setting up the cluster, now i'm planning using the calico for pods network network to communicate in/out to external |
I have no experience with flannel so there's not much to tell here, though calico alone would be enough for pod-to-pod networking. If you need to make it available with the external network as well, consider this. |
thank you tungdam, i'm looking on in |
Hi tungdam i deploy the calico-node as demonset in which appended the parameters i get the issue of the calico-node-xxxxx stucked at ContainerCreating forever. So what is the issue of that thank you |
Try to get more info from the pod by kubectl describe or kubectl logs please. |
hi all, |
hi all |
hi tumdam i think i have done the calico node for pod network , all calico pod running, the detail of calico node as below [root@leean-k8s-master ~]# kubectl -n calico-system describe pod calico-node-dtx8s | fgrep FELIX as you see the result of the env is IPTABLESBACKEND = auto which is default to legacy ( documented in calico v3.16 latest) thank you very muich |
Show me please I highly recommend you read more about k8s basic operations. |
@quangleehong with FELIX_IPTABLESBACKEND set to auto, calico-node should be detecting the backend to use. If you have it set to auto and still believe it is using the incorrect IPtables backend please open a new issue. In that issue please include logs from calico-node. I'd also suggest collecting output on a node from |
@tungdam , We are facing similar issues where our Fluent-bit daemonset is failing to deploy on Master nodes with the following (works on our worker nodes with no issues)
One of the Calico node log as below
|
From your log i can't say that the issue related to iptables "mode" detection as described in this issue. Maybe you should create another issue with more info. Never mind if you found the solution already. |
I wonder, why don't it behave like |
if you've still facing this issue, just change felixconfiguration and remove $ kubectl patch felixconfiguration default --type=json -p="[{'op': 'remove', 'path': '/spec/iptablesBackend'}]" |
Pods cannot communicate with each other or the internet when running with Calico networking on Debian Testing (aka Buster)
Expected Behavior
Installing Calico using the getting started manifests (k8s datastore, not etcd) should result in a cluster where pods can talk to each other.
Current Behavior
I bootstrapped a single-node k8s cluster on a Debian Testing (Buster) machine, using
kubeadm init --pod-network-cidr=192.168.0.0/16
andKUBECONFIG=/etc/kubernetes/admin.conf kubectl taint nodes --all node-role.kubernetes.io/master-
.I then installed Calico using the instructions at: https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/calico#installing-with-the-kubernetes-api-datastore50-nodes-or-less .
Calico pods start, and once the CNI config is installed other pods start up as well.
However, no pods can talk to any other pods, or to the internet. Packets flow correctly out of the container and onto the host, but never flow back out from there.
Switching the OS back to Debian Stable (stretch), Calico works flawlessly again.
Possible Solution
I suspect, although I have no proof, that the root cause is the release of iptables 1.8. See related bug kubernetes/kubernetes#71305 . iptables 1.8 switches to using nf_tables in the kernel, and splits the tooling into
iptables
(translation layer for nf_tables) andiptables-legacy
(the "classic" iptables). So, you end up with nf_tables in the kernel, an aware iptables 1.8 on the host OS, but legacy iptables 1.6 in the networking containers (including calico-node).A breakage in netfilter is consistent with the symptoms I've found in my debugging so far. I'm going to add the debugging I've done so far in a separate post, since it's a lot of data and I want to keep the initial report fairly crisp.
Steps to Reproduce (for bugs)
Create a trivial k8s cluster on Debian Buster machines using kubeadm, then install Calico. Observe that pod<>pod and pod<>internet routing is broken.
Context
I'm the main developer of MetalLB. I've been working on creating a VM-based test harness for MetalLB that cross-tests compatibility against a bunch of k8s network addons, including Calico. I've struggled for the past 2 days with bizarre "none of my network addons seem to work" issues, which I've just figured out is caused by "something that changed recently in Debian Buster" (because I have older Debian Buster clusters on which Calico worked fine).
Your Environment
Calico version
The text was updated successfully, but these errors were encountered: