-
Notifications
You must be signed in to change notification settings - Fork 673
Weave Net breaks when host OS uses iptables 1.8 #3465
Comments
thanks for reporting this issue @danderson
Latest image of Alpine that weave uses still has iptables 1.6.1 so updating the host iptable binaries should be a way to workaround this issue. |
Just had another user hit this. |
Just updated to buster and have the same problem with buster's docker.io. States working fine but pinging is impossible. Because buster is stable now it would be necessary to have a weave version minimum build against alpine 3.10. |
Built it on my own against alpine 3.10 with iptables 1.8 - does not work. We need non legacy iptables rule sets for buster. |
kubernetes/kubernetes#71305 describes the same issue with kube-proxy. The comments there also include some workarounds, e.g. setting the |
Yes I now also run the machines with iptables-legacy via update-alternatives and this works for me. This is a problem on other systems than docker/weave too. Probably it would make sense to write a warning and description of update-alternatives in the installation documentation. |
Definitive, waste half a day on this issue |
@HaveFun83 would you like to make a PR which presents the information in a way that would have worked better for you? |
Fixed in kube-proxy to auto-detect the mode and invoke |
@murali-reddy was this fixed by #3747, or should we expect further changes? |
It's believed to be fixed in release 2.6.1; I don't know why this issue didn't auto-close. |
I've installed several hosts mixing Alpine and Debian. |
@gregfr please open a new issue and supply the requested information. |
Switching to legacy iptables with |
@gregfr are you using Kubernetes? |
@bboreham no, not Kubernetes, just plain Docker. |
@gregfr sorry that case wasn't covered by #3747 - #3747 (comment) |
The docs still reference this bug |
I still have this issue when deploying with RKE 1.1.4, Debian 10, Kubernetes 1.17.5, iptables 1.8.2 Without setting iptables to legacy, I can't get anything to be forwarded either between pods or to outside the cluster. |
@kedare please open a new issue. |
We are also having this issue with weave 2.7.0 |
BTW I wasn't able to have it working, so I switched to tinc and it works wonderfully... :-/ |
Solution for CentOS 8Cause of this problemCentOS 8 not support weave 2.8.1 image(alpine OS) use iptables-legacy by default. docker host and weave container are use different iptable model. $ iptables -V
iptables v1.8.4 (nf_tables)
$ docker exec weave iptables -V
iptables v1.8.3 (legacy) SolutionsSwitch iptables-legacy to nf_tables for weave contailner# switch iptables-legacy to nf_tables for the weave container
docker exec -it weave sh
cd /sbin
ln -f -s xtables-nft-multi iptables
ln -f -s xtables-nft-multi ip6tables
ln -f -s xtables-nft-multi iptables-save
ln -f -s xtables-nft-multi iptables-restore
exit
# restart iptables and docker
systemctl restart iptables
systemctl restart docker Use a new image of weave with iptables nf_tablescurl -L git.io/weave -o /usr/local/bin/weave
chmod a+x /usr/local/bin/weave
export weaver_version=`weave version |tail -n 1 |awk '{print $2}'`
docker pull cucker/weave:${weaver_version}
docker tag cucker/weave:${weaver_version} weaveworks/weave:${weaver_version} |
according to #3465 (comment) this has been closed and apparently was fixed but https://github.com/weaveworks/weave/issues/3465#issuecomment-625752150 still refers to still mentions Is the documentation still accurate? if it is, then should this issue be still Open? In my case, I'm using weave-kube 2.8.1. Weave is reporting
so I'm trying to understand if I'm being affected by this or not. The network is definitely not 100% "broken" so perhaps the problem is due to something else. |
same remark as #3465 (comment) |
Last commit in this repo was 2 years ago. I doubt there will be any activity going forward. |
Our friend @rajch has been maintaining a fork of weave net at https://github.com/rajch/weave/tree/reweave I'm not sure if he has seen the iptables 1.8 issue (or if it's already been addressed in the fork) It has legs, we could do a new release, but perhaps not as Weaveworks |
Raj wrote back in reply (i'm not sure why GitHub did not post it):
|
What happened?
I installed Weave Net on a test cluster where the host OS is Debian Buster (aka Debian testing, rolling distro with the ~latest version of everything). After installing Weave, pod<>pod and pod<>internet communication is completely broken, no traffic flows at all, even between pods on the same machine.
I root-caused it to an incompatibility in iptables versions between the weave pods and the host OS. Debian Buster now ships iptables 1.8. It has a major change, which is that the
iptables
command is now a translating facade on top of nftables, i.e. creating rules withiptables
oriptables-restore
actually programs nf_tables in the kernel.OTOH, the weave pod contains iptables 1.6, the previous stable release which programs the "classic" iptables netfilter stack. So, docker on the host OS ends up programming nf_tables rules (because it uses the host iptables 1.8), and weave ends up programming legacy iptables rules (because it uses iptables 1.6). For some reason I don't fully understand, having both programmed causes packets to get dropped instead of forwarded on the host, before the packets get transmitted to the target container.
How to reproduce it?
I filed extensive reproduction steps in a sibling bug with the Calico folks, please refer to projectcalico/calico#2322 (comment) . The only changes for weave are to use Weave's pod-network-cidr in kubeadm, obviously install weave instead of calico, and then some of the output changes slightly like different interface names. Everything else, including the core failure mode and the fix, plays out the same.
Additionally, the hacky steps to verify that iptables 1.8 is the problem are at projectcalico/calico#2322 (comment) - basically hackily overwrite the iptables binaries with the ones from debian stable (which still uses 1.6), reboot the machine, and Weave starts working perfectly again.
Anything else we need to know?
There's also a bug tracking similar problems in core k8s, at kubernetes/kubernetes#71305 . In core k8s this mismatch breaks kube-proxy, but it's the exact same root cause, mismatched iptables versions.
Versions:
$ weave version
Whichever the latest one is - my test harness is running calico atm, can't look it up.
$ docker version
$ uname -a
$ kubectl version
The text was updated successfully, but these errors were encountered: