Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico autoselects iptables-legacy mode if ip_tables.ko preloaded on CentOS 8 #3709

Closed
dmitry-irtegov opened this issue Jun 22, 2020 · 6 comments · Fixed by #7111
Closed

Comments

@dmitry-irtegov
Copy link

Expected Behavior

FELIX_IPTABLESBACKEND=auto detection results are consistent with kube-proxy iptables mode.

Current Behavior

If ip_tables.ko is preloaded on CentOS 8, Calico with FELIX_IPTABLESBACKEND=auto selects legacy mode, while kube-proxy selects native mode. This results in pod->service connectivity issues, most noticeable is that coredns and calico-kube-controllers enter CrashLoopBackoff cycle.
calico-kube-controllers log contains the following:

2020-06-19 16:20:20.609 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0619 16:20:20.615416       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2020-06-19 16:20:20.616 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-06-19 16:20:30.617 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-06-19 16:20:30.617 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://100.64.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

Setting FELIX_IPTABLESBACKEND=nft or running rmmod ip_tables and blacklisting legacy iptables modules before starting the kubelet BOTH solve the issue.

Possible Solution

Steps to Reproduce (for bugs)

  1. Install cluster with Calico 3.13.3. or 3.14.1 and FELIX_IPTABLESBACKEND=auto on AWS using ami-01ca03df4a6012157 image. According to https://wiki.centos.org/Cloud/AWS this is official CentOS image for AWS, and it definitely has ip_tables preloaded. Note that the issue is image dependent; on our installations of CentOS 8 we did not observed the issue.
  2. Run iptables-save. Observe the following line at the end of output:
    # Warning: iptables-legacy tables present, use iptables-legacy-save to see them
  3. Observe kube-proxy rules but no calico rules in the output of iptables-save
  4. Observe nonempty file /proc/net/ip_tables_names
  5. CentOS 8 has no iptables-legacy binary. To view actual content of the legacy tables it is necessary to dig in /proc or run a shell in the kube-proxy container, where iptables-legacy is available. Observe calico rules in legacy tables.
  6. Observe no pod->service connectivity when pod and service are on different nodes.

Context

We are developing Kubernetes installer that should work both on CentOS/RHeL 8 and earlier OS distributives, even in mixed clusters.

Your Environment

  • Calico version. 3.13.3 and 3.14.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.18.2
  • Operating System and version: CentOS 8 AWS/ami-01ca03df4a6012157 , docker-ce
  • Link to your project (optional): https://www.kublr.com
@dmitry-irtegov
Copy link
Author

We probably found the real culprit: localnodedns 1.15.10, which unconditionally uses legacy iptables. If it manages to start before calico, it fills legacy tables, and it triggers the condition at https://github.com/projectcalico/felix/blob/954cc9b5cf62e63583d404adc5e04dd0e5ad6be3/iptables/feature_detect.go#L180

We will try to fix localnodedns or just delay its start, but, I think, it outlines the fact that current autodetection logic is rather fragile.

@caseydavenport
Copy link
Member

it outlines the fact that current autodetection logic is rather fragile.

@dmitry-irtegov yes, I agree, and this is why Felix supports explicit values as well. Unfortunately I am not aware of a better way to detect which mode is being used.

@champtar
Copy link

nodelocaldns nftables => kubernetes/dns#367

panpan0000 added a commit to panpan0000/release that referenced this issue Jul 28, 2021
…hain

-----
this is a cherry-pick from kubernetes#1548
----

As we know, kube-proxy should ensure its iptables running in the same mode with the one in host OS.
But the current login (the iptables-wrapper ) sometimes does not work well. For example, we saw /issues/80462 couple of times on Oracle Linux 8.
I couldn't reveal the root cause at that time, But obviously the iptables-wrapper does not work out at the beginning. Maybe there's some other iptables rules exist before kube-proxy runs.
And when iptables-wrapper didn't work out at the very beginning, then the kube-proxy rules will be created in legacy mode, then even kube-proxy reboots, the legacy-rules counter will always greater than the nft-rules. the wrapper script will be always miss-leaded(always run in legacy mode).

moreover , as state in the original code This assumes that some non-containerized process (eg # kubelet) has already created some iptables rules.. So my fix just make this assumption more explicit : by reading the kubelet code, those chains -- KUBE-MARK-DROP | KUBE-MARK-MASQ | KUBE-POSTROUTING are created when kubelet runs. So I put a new logic at the top of the wrapper.

last but not least, actually, there's a chance for wrapper to be failing: assuming we run kube-proxy pod right after kubelet service starts. There will be seconds for kubelet to walk thru the code until creating those chains, between this small time window, the kube-proxy may already be mis-lead. This failing case also apply for the original logic of wrapper.

And another solution is not "Auto-Detection" , but "Specific the mode" , to provide information manually:

if [ "${IPTABLE_MODE}" != "" ]; then
        mode=${IPTABLE_MODE}
else ..... # following are the original logic
But it requires kube-proxy to add env variable in diff environment, not quite user friendly.

Which issue(s) this PR fixes:
kubernetes/kubernetes#80462

Moreover, both calico will suffer from this kind of issue: unable to auto detect the correct iptable mode.
projectcalico/calico#3709

When localnodedns pod starts before calico-node pod, because localnodedns always uses legacy mode. so when calico-node starts, legacy rules will win and calico will be misled.
@panpan0000
Copy link

Attempt to enhance this detection logic in kubernetes-sigs/iptables-wrappers#2, hope it helps.

@panpan0000
Copy link

kubelet, as a service/binary, will leverage host "iptables utilities" to create some rules.
kubelet is the very first one in k8s node, right before those containers (nodelocaldns, kube-proxy, calico) whoever create rules later.
So those containers without host filesystem access( if they can access host fs, they can easily judge by iptables -v :-) ), can detect the rules which kubelet created and make it the baseline of detection logic.

@yankay
Copy link
Contributor

yankay commented Jun 16, 2022

it outlines the fact that current autodetection logic is rather fragile.

@dmitry-irtegov yes, I agree, and this is why Felix supports explicit values as well. Unfortunately I am not aware of a better way to detect which mode is being used.

HI @caseydavenport the "kubernetes-sigs/iptables-wrappers#3" has been updated and solve the issue. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants