-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
calico-node fails with KUBE-FIREWALL iptables rule #9752
Comments
@RonBarkan thanks for raising this! It's not obvious to me what's happening here without a full log. Can you post the full calico/node log so we can take a closer look? |
Here are two logs from a single node, using calico-node-kube-firewall --previous |
@caseydavenport let me know if you need anything else. |
The logs say that
So both reporters are healthy so calico-node should report healthy. The health probe is afaik on localhost:9099 by default. So if kubelet installs a ruler that drops all to localhost, it is a problem. That problem is with your k8s installation and not with calico. You can change |
|
Calico-node is a host networked pod. Do you have any other host networked pods that listen on localhost for their health probe? Why do you need kube-firewall? calico is a "firewall" and seems like your k8s firewall is preventing calico from starting. So having a firewall and installing another component (that has the same role) which the first component collides with and does not allow to start is not the most common usecase. So my suggestion is to turn of firewall in your k8s install process. What is you k8s environment? Some public cloud? What is the distro? How did you install k8s? |
I likely do not have any other host networked pods. I guess that explains the reason To be clear, the We deployed Kubernetes to "bare metal / On-Prem", using the official instructions, with I don't have visibility into why Could it be that this is a defect in the prober code, or perhaps the default configuration is causing this and I could remedy this with a config change? |
Do you have conntrack from that machine that would show which IP calico uses as a source? Note that kube-proxy emits the rule iff nodeposrt on localhost are enabled here I just installed a fresh kubeadm cluster (1.30) with calico v3.29.2 everything works out of the box including the firewall rule that is present. Connections to 9099 are all from the localhost IP
Calico does not force any source IP, that is selected by the system. That depends on your routing. And even though it looks like a rational choice to select 127.0.0.1 as the source IP, that is not mandated by anything and any local IP could work equally well. If your cluster/nodes setup requires for any reason that cannot be fixed on your side calico to enforce source IP, you could either use |
Thank you for confirming that the There was no deliberate setup, and I don't even know how to, make I will try getting a Also, will using |
That means that calico will accept a connection to any of the local IPs, that is it will accept connections to 127.0.0.1 as well as say 192.168.0.1 if that is the address of any of the local interfaces. You can use this option to limit what calico accepts. |
We created a standalone Kubernetes setup with 2 nodes: a control plane and a worker node with Calico CNI with
VXLANCrossSubnet
encapsulation (AFAIK is the default).Looks like Kubernetes generates the following iptables rule:
I have looked at the rule generation, which happens in kubelet and in kube-proxy. It looks like the latter can be disabled, but not the former.
To disable the latter, updating the kube-proxy ConfigMap field
data.config.conf
should haveiptables.localhostNodePorts: false
. The rule setting in kubelet cannot be disabled, and looks like it's there even on the main branch.The trouble is that the calico-node on the worker node fails when the rule above is present. If we delete the rule, calico-node pod immediately becomes healthy. When the rule returns, the calico-node pod fails again. All other Calico pods are running fine. Now, if we delete the rule manually, it can make a comeback at some future point.
If I look at the failing calico-node logs, I don't find any ERROR or WARN log lines except for this, which seems harmless, as it appears also when it is healthy:
Errors are present in the kubelet log:
Describing the calico-node pod shows that the liveness check failed.
Why is this happening and how can I fix it?
Calico version: v3.29.1
Tigera operator: v1.36.2
Kubernetes: v1.29.6
Both nodes are custom Linux kernels. The worker node kernel is based on 5.10.198 arm64, control-plane node is Yocto based 6.1.82 amd64.
I noticed a similar #7028 but, the errors are not at all the same. In particular, we don't get any errors related to the execution of Linux binaries.
When the above mentioned iptables rule is not present, the cluster works as expected, AFAIK, including running a demo service.
The text was updated successfully, but these errors were encountered: