Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

dmitry-irtegov · 2020-06-22T19:16:00Z

Expected Behavior

FELIX_IPTABLESBACKEND=auto detection results are consistent with kube-proxy iptables mode.

Current Behavior

If ip_tables.ko is preloaded on CentOS 8, Calico with FELIX_IPTABLESBACKEND=auto selects legacy mode, while kube-proxy selects native mode. This results in pod->service connectivity issues, most noticeable is that coredns and calico-kube-controllers enter CrashLoopBackoff cycle.
calico-kube-controllers log contains the following:

2020-06-19 16:20:20.609 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0619 16:20:20.615416       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-06-19 16:20:20.616 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-06-19 16:20:30.617 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-06-19 16:20:30.617 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Setting FELIX_IPTABLESBACKEND=nft or running rmmod ip_tables and blacklisting legacy iptables modules before starting the kubelet BOTH solve the issue.

Possible Solution

Steps to Reproduce (for bugs)

Install cluster with Calico 3.13.3. or 3.14.1 and FELIX_IPTABLESBACKEND=auto on AWS using ami-01ca03df4a6012157 image. According to https://wiki.centos.org/Cloud/AWS this is official CentOS image for AWS, and it definitely has ip_tables preloaded. Note that the issue is image dependent; on our installations of CentOS 8 we did not observed the issue.
Run iptables-save. Observe the following line at the end of output:
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
Observe kube-proxy rules but no calico rules in the output of iptables-save
Observe nonempty file /proc/net/ip_tables_names
CentOS 8 has no iptables-legacy binary. To view actual content of the legacy tables it is necessary to dig in /proc or run a shell in the kube-proxy container, where iptables-legacy is available. Observe calico rules in legacy tables.
Observe no pod->service connectivity when pod and service are on different nodes.

Context

We are developing Kubernetes installer that should work both on CentOS/RHeL 8 and earlier OS distributives, even in mixed clusters.

Your Environment

Calico version. 3.13.3 and 3.14.1
Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.18.2
Operating System and version: CentOS 8 AWS/ami-01ca03df4a6012157 , docker-ce
Link to your project (optional): https://www.kublr.com

The text was updated successfully, but these errors were encountered:

dmitry-irtegov · 2020-06-23T15:20:42Z

We probably found the real culprit: localnodedns 1.15.10, which unconditionally uses legacy iptables. If it manages to start before calico, it fills legacy tables, and it triggers the condition at https://github.com/projectcalico/felix/blob/954cc9b5cf62e63583d404adc5e04dd0e5ad6be3/iptables/feature_detect.go#L180

We will try to fix localnodedns or just delay its start, but, I think, it outlines the fact that current autodetection logic is rather fragile.

caseydavenport · 2020-07-01T16:31:12Z

it outlines the fact that current autodetection logic is rather fragile.

@dmitry-irtegov yes, I agree, and this is why Felix supports explicit values as well. Unfortunately I am not aware of a better way to detect which mode is being used.

champtar · 2020-10-29T14:56:52Z

nodelocaldns nftables => kubernetes/dns#367

…hain ----- this is a cherry-pick from kubernetes#1548 ---- As we know, kube-proxy should ensure its iptables running in the same mode with the one in host OS. But the current login (the iptables-wrapper ) sometimes does not work well. For example, we saw /issues/80462 couple of times on Oracle Linux 8. I couldn't reveal the root cause at that time, But obviously the iptables-wrapper does not work out at the beginning. Maybe there's some other iptables rules exist before kube-proxy runs. And when iptables-wrapper didn't work out at the very beginning, then the kube-proxy rules will be created in legacy mode, then even kube-proxy reboots, the legacy-rules counter will always greater than the nft-rules. the wrapper script will be always miss-leaded(always run in legacy mode). moreover , as state in the original code This assumes that some non-containerized process (eg # kubelet) has already created some iptables rules.. So my fix just make this assumption more explicit : by reading the kubelet code, those chains -- KUBE-MARK-DROP | KUBE-MARK-MASQ | KUBE-POSTROUTING are created when kubelet runs. So I put a new logic at the top of the wrapper. last but not least, actually, there's a chance for wrapper to be failing: assuming we run kube-proxy pod right after kubelet service starts. There will be seconds for kubelet to walk thru the code until creating those chains, between this small time window, the kube-proxy may already be mis-lead. This failing case also apply for the original logic of wrapper. And another solution is not "Auto-Detection" , but "Specific the mode" , to provide information manually: if [ "${IPTABLE_MODE}" != "" ]; then mode=${IPTABLE_MODE} else ..... # following are the original logic But it requires kube-proxy to add env variable in diff environment, not quite user friendly. Which issue(s) this PR fixes: kubernetes/kubernetes#80462 Moreover, both calico will suffer from this kind of issue: unable to auto detect the correct iptable mode. projectcalico/calico#3709 When localnodedns pod starts before calico-node pod, because localnodedns always uses legacy mode. so when calico-node starts, legacy rules will win and calico will be misled.

panpan0000 · 2021-07-28T08:27:11Z

Attempt to enhance this detection logic in kubernetes-sigs/iptables-wrappers#2, hope it helps.

panpan0000 · 2021-07-28T08:30:24Z

kubelet, as a service/binary, will leverage host "iptables utilities" to create some rules.
kubelet is the very first one in k8s node, right before those containers (nodelocaldns, kube-proxy, calico) whoever create rules later.
So those containers without host filesystem access( if they can access host fs, they can easily judge by iptables -v :-) ), can detect the rules which kubelet created and make it the baseline of detection logic.

yankay · 2022-06-16T11:24:02Z

it outlines the fact that current autodetection logic is rather fragile.

@dmitry-irtegov yes, I agree, and this is why Felix supports explicit values as well. Unfortunately I am not aware of a better way to detect which mode is being used.

HI @caseydavenport the "kubernetes-sigs/iptables-wrappers#3" has been updated and solve the issue. :-)

caseydavenport added the kind/support label Jul 1, 2020

caseydavenport closed this as completed Jul 1, 2020

ffuerste mentioned this issue Aug 13, 2020

race condition for pod starting order results in not working clusterIPs in case nf_tables must be used kubermatic/kubeone#1037

Closed

meldafrawi mentioned this issue Jan 25, 2021

Overlay network connectivity issue - Calico v3.13.4 legacy iptables rules are not applied in RHEL 8 rancher/rke#2432

Closed

panpan0000 mentioned this issue Jul 28, 2021

Enhance the detection logic for host iptables mode based on kubelet created rules kubernetes-sigs/iptables-wrappers#2

Closed

yankay mentioned this issue Jun 16, 2022

Suggest calico_iptables_backend: "NFT" In Centos8 kubernetes-sigs/kubespray#8987

Merged

yankay mentioned this issue Dec 20, 2022

Fix the auto iptables detection if ip_tables.ko preloaded on RHEL/CentOS 8 #7111

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

dmitry-irtegov commented Jun 22, 2020

dmitry-irtegov commented Jun 23, 2020

caseydavenport commented Jul 1, 2020

champtar commented Oct 29, 2020

panpan0000 commented Jul 28, 2021

panpan0000 commented Jul 28, 2021

yankay commented Jun 16, 2022

Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

Comments

dmitry-irtegov commented Jun 22, 2020

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

dmitry-irtegov commented Jun 23, 2020

caseydavenport commented Jul 1, 2020

champtar commented Oct 29, 2020

panpan0000 commented Jul 28, 2021

panpan0000 commented Jul 28, 2021

yankay commented Jun 16, 2022