Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Weave-Net on Kubernetes should support externalTrafficPolicy: Local #2924

Closed
tomte76 opened this issue May 2, 2017 · 16 comments
Closed

Weave-Net on Kubernetes should support externalTrafficPolicy: Local #2924

tomte76 opened this issue May 2, 2017 · 16 comments
Assignees
Labels
Milestone

Comments

@tomte76
Copy link

tomte76 commented May 2, 2017

Hello,

I opened the issue #44963 on kubernetes/kubernetes, because the OnlyLocal annotation to preserve the external client IP seems not to work on (at least) Kubernetes 1.6.1 and 1.6.2.

kubernetes/kubernetes#44963

As we are using the weave-net addon @MrHohn from Kubernetes asked me to also file the issue here to maybe get some input. Is it sufficient to point to the kubernetes issue or should I copy the debug information and output here? Thank you.

@bboreham
Copy link
Contributor

bboreham commented May 3, 2017

Thanks for the report @tomte76. At first glance OnlyLocal is not something that would work with an overlay network like Weave Net. But I have asked for clarification.

@bboreham bboreham changed the title Maybe Weave-Net on Kubernetes interferes with OnlyLocal annotation Weave-Net on Kubernetes should support OnlyLocal annotation May 5, 2017
@bboreham
Copy link
Contributor

bboreham commented May 5, 2017

This comment from the Kubernetes issue seems pertinent:

the original destination IP is the node IP instead of service IP. I'd expect the return packets' source address got rewritten to the node IP.

So the kube-proxy DNAT rule is all that is desired, and we would need some way to say that Weave Net should not do its masquerading in that case.
Maybe try to get consistent with kube-proxy's --cluster-cidr setting at the same time?

@kachkaev
Copy link

I just faced the same issue guys, details here: kubernetes/kubernetes#51014

@bboreham what are your thoughts on a potential fix after three months? Just curious.

@bboreham
Copy link
Contributor

Sorry, haven't thought about this much.

The Weave Net masquerade rules are applied by this line which calls through to here and they are unconditional - everything coming in or out of the weave address range is masqueraded.

Since OnlyLocal is applied on a per-service basis we either need kube-proxy to do more or we need a much more sophisticated implementation in Weave Net.

The existence of ip-masq-agent suggests that people need something more than a blanket policy.

Given it's still impossible to query for the --cluster-cidr value, I think maybe an option to turn off Weave Net's masquerading, combined with a user-configured ip-masq-agent might work.

Happy to consider a PR that does any or all of the above.

@mrdima
Copy link

mrdima commented Jan 19, 2018

Currently we're going to solve this with 2 things for our nginx-ingress nodePort service:

  1. a bash script/systemd unit which inserts iptables rule(s) to prevent SNAT for the node's local weave IP ranges (run on each Kubernetes node). See https://gist.github.com/mrdima/e0098308f2351e90132c7cf0e4fa460c
  2. Set service.spec.externalTrafficPolicy to the value Local for the ingress service as stated in https://v1-9.docs.kubernetes.io/docs/tutorials/services/source-ip/#

This won't add new iptables rules when weave allocates a new range (or update on any modifications on the local weave ranges)
I think this should be in the weave go code ofcourse, but I don't have time for that. I don't think there would be a need for using the ip-masq-agent.

I hope this is of use for others. Feedback always welcome

@bboreham
Copy link
Contributor

Thanks for the tips @mrdima!

Since your shell script is a bit dense, could you summarise what it does? Or maybe give a before/after example to give the idea?

@mrdima
Copy link

mrdima commented Jan 19, 2018

@bboreham sure.
The script accepts parameters, start, which is the default and stop. Stop removes the iptables rules it inserts.
The script figures out the weave range and local node IP weave ranges of a node using the "weave report" command and then parsing the IPAM output using the jq command (so that's an external utility this script requires!). It matches the node using the local hostname and the nickname in weave.

The problem is in the nat table (iptables -L -n -t nat) WEAVE chain, this contains normally something like:
Chain WEAVE (1 references)
target prot opt source destination
Chain WEAVE (1 references)
target prot opt source destination
RETURN all -- 10.32.0.0/12 224.0.0.0/4
MASQUERADE all -- !10.32.0.0/12 10.32.0.0/12
MASQUERADE all -- 10.32.0.0/12 !10.32.0.0/12

What we want is RETURN rules before this 2nd line for each local weave IP range. So we insert rules for this on line 2 with "-I 2" using iptables
There's a log2 calculation function which I grabbed of the net to determine the cidr for the network size which is in the IPAM output as I need the cidr to pass to iptables

So an example part of the weave report output (IPAM snippet only, there's more nodes in the entries normally ofcourse) might be:
"IPAM": {
"Paxos": null,
"Range": "10.32.0.0/12",
"RangeNumIPs": 1048576,
"ActiveIPs": 16,
"DefaultSubnet": "10.32.0.0/12",
"Entries": [
{
"Token": "10.40.0.0",
"Size": 262144,
"Peer": "7a:75:3e:5f:2e:7a",
"Nickname": "mynodename",
"IsKnownPeer": true,
"Version": 18
}
],
"PendingClaims": null,
"PendingAllocates": null
}
}

I'm using a systemd unit to start this script on each node, example output:
Jan 19 10:36:23 mynodename systemd[1]: Starting fix-weave-snat.service...
Jan 19 10:36:23 mynodename fix-weave-snat.sh[7973]: running /opt/bin/fix-weave-snat.sh start
Jan 19 10:36:23 mynodename fix-weave-snat.sh[7973]: action: start
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: weave range is: 10.32.0.0/12
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: local weave ip networks for host mynodename is/are: 10.40.0.0
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: number of hosts (size) in those networks are: 262144
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: The calculated cidrs are: 18
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: weave snat rules exist, inserting or removing workaround rules in nat table, chain WEAVE at line 2
Jan 19 10:36:24 mynodename fix-weave-snat.sh[7973]: inserting (iptables -t nat -I WEAVE 2 ! -s 10.32.0.0/12 -d 10.40.0.0/18 -j RETURN)
Jan 19 10:36:24 mynodename systemd[1]: Started fix-weave-snat.service.

So the iptables nat WEAVE chain now looks like:
Chain WEAVE (1 references)
target prot opt source destination
RETURN all -- 10.32.0.0/12 224.0.0.0/4
RETURN all -- !10.32.0.0/12 10.40.0.0/18
MASQUERADE all -- !10.32.0.0/12 10.32.0.0/12
MASQUERADE all -- 10.32.0.0/12 !10.32.0.0/12

So the 2nd line is added. It works if the node has multiple weave IP ranges, but as said, this doesn't update it...could add a timer to the systemd unit, but you really want this event based within weave when it allocates a new range.
Hope this is understandable, it's quite a lot of info :).

@KresoDenis
Copy link

Im surprised that this isnt bigger issue since we cant know users IP visiting our API via LoadBalancer.

During our development, we found WeaveNet to be our favourite because of encryption and its stable but this is showstopper for us. We could use HostPort and manual LB setup but its not the right way.

Looking forward to solution for this. This would make it 10 out of 10 perfect solution for us.

@KresoDenis
Copy link

KresoDenis commented Feb 1, 2018

P.S. mrdimas script works. i just had to modify regexes to

if grep -qe "MASQUERADE.*all.*[ !]${weaverange}.*[ !]${weaverange}" /tmp/weave-iptables.txt; then
if grep -qe "RETURN.*all.*!${weaverange}.*$iprange/${cidrsa[$i]}" /tmp/weave-iptables.txt; then

to make it more universal, since iptables in my shell returns a lot of whitespaces (not just one)

Only thing i will have to test later is to see if this only does implement for services with "LOCAL" or for all services.

@KresoDenis
Copy link

By the way i think in script where you calculate cidr from size 32768 could be incorrect. Currently its
cidrs="$cidrs $(log2 $size)"

shouldnt it be like
32-log2 size

@mrdima
Copy link

mrdima commented Feb 6, 2018

Yes, you're totally right, I need to fix that, I'll update the gist and also the regexes after retesting (when I have a moment, hopefully tomorrow). Yet I now see something interesting in weave report on an environment, sizes like: 98304 and 196608. These do not fit into a normal netmask...that's annoying.

@mrdima
Copy link

mrdima commented Mar 19, 2018

Ok, more than a month later I updated my gist. It now uses the ip ranges instead of (sometimes impossible to match) cidr ranges equal to the ones weave has. (-m iprange --dst-range).

@brb
Copy link
Contributor

brb commented Apr 16, 2018

@mrdima Have you considered removing MASQUERADE all -- !10.32.0.0/12 10.32.0.0/12 instead of adding RETURN all -- !10.32.0.0/12 10.40.0.0/18?

@brb brb self-assigned this May 1, 2018
@brb brb added this to the 2.4 milestone May 1, 2018
@bboreham bboreham changed the title Weave-Net on Kubernetes should support OnlyLocal annotation Weave-Net on Kubernetes should support externalTrafficPolicy: Local May 9, 2018
bboreham added a commit that referenced this issue Jun 11, 2018
@hswong3i
Copy link

hswong3i commented Jul 16, 2018

Could we have a new stable release for this issue, e.g. v2.3.1? According to my understanding, roadmap issues for v2.3.1 are all merged (https://github.com/weaveworks/weave/milestone/72)?

The support of externalTrafficPolicy: Local is very useful, e.g. ingress-nginx implementation with Deploy + Service + type: LoadBalancer.

Thank you for your kindly consideration.

@brb
Copy link
Contributor

brb commented Jul 16, 2018

@hswong3i Unfortunately, this is not a bugfix, so we cannot include it in the minor v2.3.1 release.

While waiting for v2.4.0 and if you want to test the feature, you can use the weaveworks/weave-kube:egress-test-v2 and weaveworks/weave-npc:egress-test-v2 images. To enable the feature, you need to set the NO_MASQ_LOCAL env var of weave-net to 1 in the DaemonSet definition file.

@brb
Copy link
Contributor

brb commented Jul 25, 2018

Released: https://github.com/weaveworks/weave/releases/tag/v2.4.0!

To use the feature, you need to pass NO_MASQ_LOCAL=1 to cloud.weave.works when installing Weave Net as it is described https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#configuration-options.

hswong3i added a commit to alvistack/kubernetes-sigs-kubespray that referenced this issue Aug 7, 2018
Upstream Changes:

-   weave 2.4.0 (https://github.com/weaveworks/weave/releases/tag/v2.4.0)
-   Support `externalTrafficPolicy: Local` (weaveworks/weave#2924)
-   Make the ipset list size bigger (weaveworks/weave#3305)
-   Break out of kube rm-peers loop if nothing changes (weaveworks/weave#3317)

Our Changes:

-   Revamp weave-net.yml.j2 with upstream changes
-   Add more variables for customization
-   Replace WEAVE_PASSWORD with k8s secret
-   Remove hard-corded seed mode support, in favor of variables customization
rguichard pushed a commit to rguichard/kubespray that referenced this issue Oct 5, 2018
Upstream Changes:

-   weave 2.4.0 (https://github.com/weaveworks/weave/releases/tag/v2.4.0)
-   Support `externalTrafficPolicy: Local` (weaveworks/weave#2924)
-   Make the ipset list size bigger (weaveworks/weave#3305)
-   Break out of kube rm-peers loop if nothing changes (weaveworks/weave#3317)

Our Changes:

-   Revamp weave-net.yml.j2 with upstream changes
-   Add more variables for customization
-   Replace WEAVE_PASSWORD with k8s secret
-   Remove hard-corded seed mode support, in favor of variables customization
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants