Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave Net breaks when host OS uses iptables 1.8 #3465

Closed
danderson opened this issue Dec 1, 2018 · 28 comments
Closed

Weave Net breaks when host OS uses iptables 1.8 #3465

danderson opened this issue Dec 1, 2018 · 28 comments

Comments

@danderson
Copy link

What happened?

I installed Weave Net on a test cluster where the host OS is Debian Buster (aka Debian testing, rolling distro with the ~latest version of everything). After installing Weave, pod<>pod and pod<>internet communication is completely broken, no traffic flows at all, even between pods on the same machine.

I root-caused it to an incompatibility in iptables versions between the weave pods and the host OS. Debian Buster now ships iptables 1.8. It has a major change, which is that the iptables command is now a translating facade on top of nftables, i.e. creating rules with iptables or iptables-restore actually programs nf_tables in the kernel.

OTOH, the weave pod contains iptables 1.6, the previous stable release which programs the "classic" iptables netfilter stack. So, docker on the host OS ends up programming nf_tables rules (because it uses the host iptables 1.8), and weave ends up programming legacy iptables rules (because it uses iptables 1.6). For some reason I don't fully understand, having both programmed causes packets to get dropped instead of forwarded on the host, before the packets get transmitted to the target container.

How to reproduce it?

I filed extensive reproduction steps in a sibling bug with the Calico folks, please refer to projectcalico/calico#2322 (comment) . The only changes for weave are to use Weave's pod-network-cidr in kubeadm, obviously install weave instead of calico, and then some of the output changes slightly like different interface names. Everything else, including the core failure mode and the fix, plays out the same.

Additionally, the hacky steps to verify that iptables 1.8 is the problem are at projectcalico/calico#2322 (comment) - basically hackily overwrite the iptables binaries with the ones from debian stable (which still uses 1.6), reboot the machine, and Weave starts working perfectly again.

Anything else we need to know?

There's also a bug tracking similar problems in core k8s, at kubernetes/kubernetes#71305 . In core k8s this mismatch breaks kube-proxy, but it's the exact same root cause, mismatched iptables versions.

Versions:

$ weave version

Whichever the latest one is - my test harness is running calico atm, can't look it up.

$ docker version

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:43 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:06 2018
  OS/Arch:          linux/amd64
  Experimental:     false

$ uname -a

Linux cluster1-controller 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-11-02) x86_64 GNU/Linux

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:57:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:46:57Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
@murali-reddy
Copy link
Contributor

thanks for reporting this issue @danderson

So, the root cause definitely seems to be mixing iptables 1.6 and iptables 1.8 against the same kernel. If you use all iptables 1.6, everything is fine. I'm guessing if you use only iptables 1.8 (which translates into nftables but faithfully emulates the userspace interfaces), everything would also work fine. But with the host OS using iptables 1.8 (which programs nftables) and containers like calico-node using iptables 1.6 (which programs legacy iptables), packet forwarding seems to break.

Latest image of Alpine that weave uses still has iptables 1.6.1 so updating the host iptable binaries should be a way to workaround this issue.

@bboreham
Copy link
Contributor

Just had another user hit this.
I see that Calico added an explicit setting to request NFT mode.

@cy8aer
Copy link

cy8aer commented Sep 1, 2019

Just updated to buster and have the same problem with buster's docker.io. States working fine but pinging is impossible.

Because buster is stable now it would be necessary to have a weave version minimum build against alpine 3.10.

@cy8aer
Copy link

cy8aer commented Sep 1, 2019

Built it on my own against alpine 3.10 with iptables 1.8 - does not work. We need non legacy iptables rule sets for buster.

@praseodym
Copy link

kubernetes/kubernetes#71305 describes the same issue with kube-proxy. The comments there also include some workarounds, e.g. setting the iptables tool to legacy mode (update-alternatives --set iptables /usr/sbin/iptables-legacy).

@cy8aer
Copy link

cy8aer commented Sep 1, 2019

Yes I now also run the machines with iptables-legacy via update-alternatives and this works for me. This is a problem on other systems than docker/weave too.

Probably it would make sense to write a warning and description of update-alternatives in the installation documentation.

@HaveFun83
Copy link

Yes I now also run the machines with iptables-legacy via update-alternatives and this works for me. This is a problem on other systems than docker/weave too.

Probably it would make sense to write a warning and description of update-alternatives in the installation documentation.

Definitive, waste half a day on this issue
https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-things-to-watch-out-for

@bboreham
Copy link
Contributor

bboreham commented Oct 7, 2019

@HaveFun83 would you like to make a PR which presents the information in a way that would have worked better for you?

@murali-reddy
Copy link
Contributor

Fixed in kube-proxy to auto-detect the mode and invoke update-alternatives accordingly

kubernetes/kubernetes#82966

@drigz
Copy link

drigz commented May 8, 2020

@murali-reddy was this fixed by #3747, or should we expect further changes?

@bboreham
Copy link
Contributor

bboreham commented May 8, 2020

It's believed to be fixed in release 2.6.1; I don't know why this issue didn't auto-close.

@bboreham bboreham closed this as completed May 8, 2020
@gregfr
Copy link

gregfr commented May 17, 2020

I've installed several hosts mixing Alpine and Debian.
All hosts using iptables legacy work; all using nf_tables don't.
I'm using weavenet 2.6.2 downloaded today.

@bboreham
Copy link
Contributor

@gregfr please open a new issue and supply the requested information.

@gregfr
Copy link

gregfr commented May 17, 2020

Switching to legacy iptables with update-alternatives --set iptables /usr/sbin/iptables-legacy worked, but I had to reboot the host.

@bboreham
Copy link
Contributor

@gregfr are you using Kubernetes?

@gregfr
Copy link

gregfr commented Jun 2, 2020

@bboreham no, not Kubernetes, just plain Docker.

@bboreham
Copy link
Contributor

bboreham commented Jun 2, 2020

@gregfr sorry that case wasn't covered by #3747 - #3747 (comment)

@ensonic
Copy link

ensonic commented Jul 29, 2020

@kedare
Copy link

kedare commented Aug 25, 2020

I still have this issue when deploying with RKE 1.1.4, Debian 10, Kubernetes 1.17.5, iptables 1.8.2

Without setting iptables to legacy, I can't get anything to be forwarded either between pods or to outside the cluster.

@bboreham
Copy link
Contributor

@kedare please open a new issue.

@tahamr83
Copy link

We are also having this issue with weave 2.7.0

@gregfr
Copy link

gregfr commented Sep 22, 2021

BTW I wasn't able to have it working, so I switched to tinc and it works wonderfully... :-/

@cucker0
Copy link

cucker0 commented Sep 29, 2021

Solution for CentOS 8

Cause of this problem

CentOS 8 not support iptables-legacy. CentOS 8 use nf_tables iptables by default.

weave 2.8.1 image(alpine OS) use iptables-legacy by default.

docker host and weave container are use different iptable model.

$ iptables -V
iptables v1.8.4 (nf_tables)

$ docker exec weave iptables -V
iptables v1.8.3 (legacy)

Solutions

Switch iptables-legacy to nf_tables for weave contailner

# switch iptables-legacy to nf_tables for the weave container
docker exec -it weave sh
cd /sbin
ln -f -s xtables-nft-multi iptables
ln -f -s xtables-nft-multi ip6tables
ln -f -s xtables-nft-multi iptables-save
ln -f -s xtables-nft-multi iptables-restore
exit

# restart iptables and docker
systemctl restart iptables
systemctl restart docker

Use a new image of weave with iptables nf_tables

reference

curl -L git.io/weave -o /usr/local/bin/weave
chmod a+x /usr/local/bin/weave

export weaver_version=`weave version |tail -n 1 |awk '{print $2}'`
docker pull cucker/weave:${weaver_version}
docker tag cucker/weave:${weaver_version} weaveworks/weave:${weaver_version}

@horacimacias
Copy link

according to #3465 (comment) this has been closed and apparently was fixed but https://github.com/weaveworks/weave/issues/3465#issuecomment-625752150 still refers to still mentions Weave Net does not work on hosts running iptables 1.8 or above, only with 1.6

Is the documentation still accurate? if it is, then should this issue be still Open?

In my case, I'm using weave-kube 2.8.1. Weave is reporting iptables v1.8.3 (nf_tables) and the host reports iptables v1.8.4 (nf_tables).
I have several pods running and things seem to be working well enough so far, but for some reason I'm trying to understand I have some pods not able to talk to others and I see NPC mentioning blocked connections:

WARN: 2023/11/17 11:03:17.984797 TCP connection from 10.32.0.14:46464 to 10.32.0.9:8080 blocked by Weave NPC.
WARN: 2023/11/17 11:03:19.007947 TCP connection from 10.32.0.14:46464 to 10.32.0.9:8080 blocked by Weave NPC.
WARN: 2023/11/17 11:03:21.055931 TCP connection from 10.32.0.14:46464 to 10.32.0.9:8080 blocked by Weave NPC.
WARN: 2023/11/17 11:03:25.087942 TCP connection from 10.32.0.14:46464 to 10.32.0.9:8080 blocked by Weave NPC.
WARN: 2023/11/17 11:03:33.537345 TCP connection from 10.32.0.14:46464 to 10.32.0.9:8080 blocked by Weave NPC.

so I'm trying to understand if I'm being affected by this or not. The network is definitely not 100% "broken" so perhaps the problem is due to something else.
Still, the fact that this issue is closed but the documentation still states Weave Net does not work on hosts running iptables 1.8 or above, only with 1.6 is troubling me.

@StCyr
Copy link

StCyr commented Jan 10, 2024

same remark as #3465 (comment)

@hakman
Copy link

hakman commented Jan 10, 2024

Last commit in this repo was 2 years ago. I doubt there will be any activity going forward.

@kingdonb
Copy link
Contributor

Our friend @rajch has been maintaining a fork of weave net at https://github.com/rajch/weave/tree/reweave

I'm not sure if he has seen the iptables 1.8 issue (or if it's already been addressed in the fork)

It has legs, we could do a new release, but perhaps not as Weaveworks

@kingdonb
Copy link
Contributor

Raj wrote back in reply (i'm not sure why GitHub did not post it):

In my fork, the alpine base used to create the weave-kube image uses iptables v1.8.9, and therefore the problem should be solved. You could try using my released image by replacing weaveworks/weave-kube:latest with rajchaudhuri/weave-kube:latest in the weave manifest.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests