Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition for pod starting order results in not working clusterIPs in case nf_tables must be used #1037

Closed
ffuerste opened this issue Aug 13, 2020 · 1 comment · Fixed by #1058
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ffuerste
Copy link
Contributor

What happened:

Current major versions of Linux distributions don't support iptables-legacy anymore. Instead nf_tables is used (e.g. RHEL8 or debian Buster).
The availability of nf_tables only leads to a race condition for starting kube-proxy and node-local-dns in the correct order after a node is started.

Currently node-local-dns supports only iptables-legacy. For the support of nf_tables an open/blocked issue exists.
kube-proxy supports both iptables modes (see here) and determines during it's starting phase what it should use (see here).
This means, if the node-local-dns pod on the node is starting first and creates it's iptables-legacy rules, kube-proxy finds these legacy-rules and starts using the legacy mode too. Unfortunately, kube-proxy uses some chains which the kubelet creates when it starts. E.g. the chain "KUBE_MARK-DROP". Because the OS is offering nf_talbles only, the kubelet creates the particular chains with nf_tables and not iptables-legacy.
If now kube-proxy is starting in iptables-legacy mode, it tries to write to the kubelet chains and fails, becaue it cannot find the nf_tables chains:

kubectl -n kube-system logs $(kubectl -n kube-system get pods -o wide | grep proxy | grep cp-0 | awk '{print $1}') -f
W0813 15:17:48.190861       1 server_others.go:559] Unknown proxy mode "", assuming iptables proxy
I0813 15:17:48.203720       1 node.go:136] Successfully retrieved node IP: 172.16.10.5
I0813 15:17:48.203751       1 server_others.go:186] Using iptables Proxier.
I0813 15:17:48.205552       1 server.go:583] Version: v1.18.6
I0813 15:17:48.206007       1 conntrack.go:52] Setting nf_conntrack_max to 131072
I0813 15:17:48.206845       1 config.go:315] Starting service config controller
I0813 15:17:48.206863       1 shared_informer.go:223] Waiting for caches to sync for service config
I0813 15:17:48.206886       1 config.go:133] Starting endpoints config controller
I0813 15:17:48.206898       1 shared_informer.go:223] Waiting for caches to sync for endpoints config
I0813 15:17:48.307068       1 shared_informer.go:230] Caches are synced for endpoints config 
I0813 15:17:48.307209       1 shared_informer.go:230] Caches are synced for service config 
E0813 15:17:48.347064       1 proxier.go:1555] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.8.3 (legacy): Couldn't load target `KUBE-MARK-DROP':No such file or directory

Error occurred at line: 84
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
)
I0813 15:17:48.347141       1 proxier.go:825] Sync failed; retrying in 30s

For reference, see

Because of this kube-proxy is entering an endless loop, trying to write to the chain KUBE-MARK-DROP and never creates the iptables rules for clusterIPs.

On the other side, if kube-proxy is starting before node-local-dns is creating it's iptables-legacy rules, the kube-proxy creates all rules using nf_tables. Hence, in this case the chain KUBE-MARK-DROP exists and everything is working as expected.

What is the expected behavior:
Working clusterIPs after each node start.

How to reproduce the issue:
Rebooting nodes for testing the race condition.

For a running node delete the running kube-proxy pod and flush the nf_tables rules on the host

#remove/flush all rules & delete chains
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT

Anything else we need to know?
The root cause is also hitting calico, see here.

I tested to use kube-proxy in ipvs mode. Unfortunately, I hit here another blocking issue for Azure (most likely for other public cloud providers as well), regarding services of type LoadBalancer. Hence, using kube-proxy in ipvs mode is not an option.

Because the root cause is the not patched node-local-dns pod I think a good option could be to disable it's deployment for now. Maybe introducing node-local-dns as a feature in kubeone which can be deactivated (e.g. like PodSecurityPolicies)?

Information about the environment:
KubeOne version (1.0.0-beta.2):
Operating system: RHEL8
Provider you're deploying cluster on: Azure
Operating system you're deploying on: CentOS8

@ffuerste ffuerste added the kind/bug Categorizes issue or PR as related to a bug. label Aug 13, 2020
@kron4eg kron4eg removed their assignment Aug 14, 2020
@kron4eg
Copy link
Member

kron4eg commented Aug 14, 2020

Sorry, I wasn't able to reproduce this problem no matter how many times I've rebooted VMs. What kubernetes version do you use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants