Routing Issue Outside VPC #53

incognick · 2018-03-29T16:56:00Z

I'm experiencing a routing issue from outside of my VPC where my EKS cluster is located. My setup is as follows:

VPC A with 3 private subnets. Fourth subnet is public with NAT gateway
VPC B with VPN access.
Peering connection between the two.

VPC A houses my EKS cluster with 3 worker nodes each in a different subnet. VPC B is our existing infrastructure (different region) with VPN access.

Sometimes (not always), I'll have trouble getting a route into a pod from VPC B. Connection will timeout. Ping doesn't work either. If I ssh into one of the worker nodes in VPC A, I can route just fine into the pod.

I have confirmed this is not a ACL issue, or SG issue as I can route into other pods on the same node.
This is not confined to a single subnet.

Let me know if you need more information as I can reproduce pretty easily. I posted this question in the aws eks slack channel and they directed me to create an issue here.

Thank you!

incognick · 2018-03-29T16:58:29Z

FYI @bchav

lbernail · 2018-03-29T20:19:35Z

@incognick : this will happen when the pod IP is on a secondary ENI

Plugin (version 0.1.4) cannot work across VPC peering (see issue: #44).
We are looking at using this plugin, not in EKS but in on our own cluster. Here is a few more details on the issue with the plugin today:

it SNATs all traffic sent outside the VPC CIDR block with the main interface IP

sudo iptables -t nat -nL POSTROUTING
SNAT all -- 0.0.0.0/0 !172.30.0.0/16 /* AWS, SNAT */ ADDRTYPE match dst-type !LOCAL to:172.30.70.171

=> it works for incoming traffic thanks to conntrack but you lose POD IP when traffic is sent from a pod

it uses an ip rule to force all traffic sent to IP addresses outside the VPC to use the primary interface:

ip rule
1024:	not from all to 172.16.0.0/16 lookup main

=> So incoming traffic is dropped by reverse path filter (example with pod with IP 172.16.0.100 on ENI 2 (ens6) and an instance in second VPC with IP 172.17.0.200):

ip route get 172.16.0.100 from 172.17.0.200 iif ens6
RTNETLINK answers: Invalid cross-device link

disabling rp filtering is not enough because the traffic is then dropped by aws source-dest check (traffic from pod IP associated to ENI 2 is sent via ENI 1)
echo 0 | sudo tee /proc/sys/net/ipv4/conf/{all,ens5,ens6}/rp_filter
it works if you disable source/dest check but it does not make sense for this traffic to go through the primary interface anyway

Today, we use a patched version of the image (not really something that I can include in a PR because I simply removed the call to the function setting up the rule and NAT, but I'm happy to discuss it)

incognick · 2018-03-30T14:13:39Z

@lbernail Thanks for the response. Hopefully this can be address soon!

edwize · 2018-03-30T15:59:53Z

@lbernail, I've been dealing with a similar issue with routing over my VPN from VPC. POD's with IP's from Primary ETH0 pool work fine to office network (172.33.x.x <->10.10.x.x. ) Traffic from pod's using secondary interface work fine POD->office, but not office->POD. The issue is office->POD comes in on ETH1 but out ETH0. Id like to discuss "I simply removed the call to the function setting up the rule and NAT"
Since this project is rapidly evolving, I don't mind having my own patched version until additional CIDR routing is standardized.

edwize · 2018-03-30T20:45:36Z

@lbernail I got my POC working by doing the following directly on my Node, but would like to have the plug-in fixed to automatically apply to new Nodes and in a flexible way:

SOURCE/DEST check on ENI's:
Set Source/Destination check "false" on eth0-2 via GUI

REVERSE PATH FILTERING was already off (zero):
sysctl -a | grep rp_filter | grep -v arp_filter

DELETE SNAT RULE:
iptables -t nat -L POSTROUTING
iptables -t nat -D POSTROUTING ! -d 172.33.0.0/16 -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 172.33.16.129
iptables -t nat -L POSTROUTING

DELETE IP RULE:
sudo ip rule show
sudo ip rule del prio 1024 (the "not from all to 172.16.0.0/16 lookup main" rule )
sudo ip route flush cache
sudo ip rule show

I was then able to CURL IP's from both the ETH0 and ETH1 ENI's and POD<->office over VPN worked.

lbernail · 2018-03-31T11:05:27Z

@edwize : yes this current limit applies to any traffic outside of the VPC CIDR (so peered VPC and VPN connections or Directconnect links)

A quick note on what you need: once you remove the IP rule (ip rule del prio 1024) you don't need to disable rp_filter (if it is enabled) or source-dest check because traffic from PODs with IP on a secondary ENIs will use the proper ENI, thanks to 1536 priority rules added by the plugin such as:

ip rule 
1536:	from 172.16.0.100 lookup 2
1536:	from 172.16.0.101 lookup 3

With route table 2 forcing traffic to ENI 2 and 3 through ENI 3 (in my case primary ENI is ens5 and ens6 and ens7 are ENI 2 and 3):

ip route show table 2
default via 172.16.0.1 dev ens6
172.16.0.1 dev ens6  scope link
ip route show table 3
default via 172.16.0.1 dev ens7
172.16.0.1 dev ens7  scope link

Bear in mind that if you do this, you can't use nodes in public subnets (with public IP addresses on the primary interface) because the pods won't have public IPs associated with pod IPs so traffic will not be NATed to a public IP by the Internet Gateway. It is not an issue in our case because we run our cluster in private subnets only.

I'll create a quick branch with the fix we currently use so you can test it if you want.

lbernail · 2018-03-31T11:18:00Z

@edwize : you can build a custom image from this branch: https://github.com/lbernail/amazon-vpc-cni-k8s/tree/lbernail/disable-nat-rule

It contains 2 additional commits compared to master:

one ensuring that logs are flushed properly (I created a PR for this on the main repo) which makes debugging a lot easier
one removing the call to function c.networkClient.SetupHostNetwork which configures the 1024 priority rule forcing traffic exiting the VPC through then main interface and creates the SNAT rule that changes source address to the main IP address from the node (the other lines commented are just so the code builds without errors regarding unused variables or imports)

edwize · 2018-03-31T17:32:14Z

@lbernail Thanks for the branch, and advisement. I assume this project lags behind internal EKS work, because the lack of VPN and VPC-VPC support is surprising.

eswarbala · 2018-04-04T21:10:13Z

Really appreciate the discussion here. We are planning to add a flag that disables the NATing to support the scenarios as discussed here.

edwize · 2018-04-05T01:20:25Z

@eswarbala That would be appreciated. I was able to use @lbernail branch with the single NAT change to build a custom container and it deployed successfully. I'm using KOPS, and found that the "amazon-k8s-cni:0.1.1" container was hard coded, so now I have a custom build of that project too. Whee!

Dieler · 2018-04-19T14:56:10Z

@edwize I haven't worked with Go yet but would really like to build my own custom container from that branch. Can you please provide some guidance on how to build this project?

edwize · 2018-04-23T16:44:27Z

@Dieler I hadn't seen Go either, but I found setting up Go on my MAC may not have created the exact environment for recompiling AWS's CNI. However, I did find that the Kubernete's project included a Dockerized build environment with all the tools and libraries, so I hacked together this not so pretty method on an Ubuntu 16.04 server:

Get the Kubernetes project: git clone https://github.com/kubernetes/kubernetes
Enter the build container: cd kubernetes; build/shell.sh
Add AWS's CNI to test clean build
go get github.com/aws/amazon-vpc-cni-k8s
go get -u github.com/golang/dep/cmd/dep
cd $GOPATH/src/github.com/aws/amazon-vpc-cni-k8s
git rm --cached vendor/k8s.io/kubernetes
dep status (Go dependency checker installed above "go get -u github.com/golang/dep/cmd/dep" )
dep ensure
make (successfully built stuff)
Add modified CNI inside Kubernetes project, Docker container
go get github.com/lbernail/amazon-vpc-cni-k8s
cd $GOPATH/src/github.com/lbernail/amazon-vpc-cni-k8s
git rm --cached vendor/k8s.io/kubernetes
git checkout origin/lbernail/disable-nat-rule
GO back to AWS directory
cd $GOPATH/src/github.com/aws/amazon-vpc-cni-k8s
cp ../../lbernail/amazon-vpc-cni-k8s/ipamd/ipamd.go ipamd/ipamd.go
rm verify-network verify-aws aws-cni aws-k8s-agent
make
Get CNI out of the build container:
from another terminal to Ubuntu server
docker ps ==> 4f54652ae68f ( new amazon CNI container )
docker cp 4f54652ae68f:/go/src/github.com/aws/amazon-vpc-cni-k8s/aws-k8s-agent ~/eddie/
From terminal outside of build environment, run Docker
cd ~/eddie/
git clone http://github.com/lbernail/amazon-vpc-cni-k8s
cd amazon-vpc-cni-k8s/
cp ../aws-* .
docker build -f scripts/dockerfiles/Dockerfile.release -t "amazon/amazon-k8s-cni:latest" .
docker images ==> "amazon/amazon-k8s-cni"
TAG image and put in your private repo
docker images
docker tag a9f6a99f9ccc yourcompany/amazon-k8s-cni:0.0.1
docker push YOURCOMPANY/amazon-k8s-cni:0.0.1
Now if you are using KOPS, you have to re-build it because the CNI is hardcoded to AWS:0.1.1
( the essential change is below, the KOPS project has build info which is useful )
vi upup/models/cloudup/resources/addons/networking.amazon-vpc-routed-eni/0.1.1-kops.1.yaml.template
(was) image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:0.1.1
(now) image: YOURCOMPANY/amazon-k8s-cni:0.0.1
git commit -am "swap routed-ini to YOURCOMPANY/amazon-k8s-cni:0.0.1"

robbrockbank · 2018-05-09T22:31:38Z

I hit this same issue today where I've set up a customer gateway/VPN to another network. I'm using BGP to advertise routes between AWS and my other network.

Whilst I am able successfully route from my AWS EKS pods to my remote network, the SNAT-ing of the pod IP is causing other issues on my remote node (in particular setting up policy rules using Calico where I am assuming the source address to be the Pod IP).

@eswarbala : You mention adding a flag to disable the NAT-ing. I was wondering what that might look like in terms of API and behavior, would it be a simple "disable always" which would omit adding the SNAT rules?

I've also hit the RPF check issue as well - seemingly for some secondary ENIs but not for others.

robbrockbank · 2018-05-14T21:05:13Z

@edwize: Regarding building on your Mac. I was able to get this working without any additional changes. You'll need to tell the compiler to generate a linux binary. Setting GOOS=linux before calling the make target did the trick for me,

rob$ GOOS=linux make static
go build -o aws-k8s-agent main.go
go build -o aws-cni plugins/routed-eni/cni.go
go build verify-aws.go
go build verify-network.go

robbrockbank · 2018-05-16T21:15:37Z

@eswarbala : I was playing around with removing the SNAT iptables rule and the VPC ip routing rule. That works great for my cluster to cluster communication but as discussed in this thread means I can't access the internet from my AWS EKS pods. I thought it might be sufficient to add a NAT gateway to my subnets, but couldn't seem to get that working.

I was wondering - would you expect that configuring a NAT gateway would cover the case where we are disabling the SNAT, and if so, are there any pointers you could give on how to set it up?

lbernail · 2018-05-17T08:17:03Z

@robbrockbank It should work with NAT gateways (this is what we did). Maybe you are a missing a default route to your nat gateways?

robbrockbank · 2018-05-18T19:15:13Z

@Ibernail: Thanks for the follow-up, I'll give it another try - I figured it might just be a misconfiguration on my part, it's good to know that it should work! I had a default route to my NAT gateway, but presumably I hadn't set up my internet gateway correctly.

I have another follow up, which may not really belong here, but I think is useful to overall discussion of off-VPC routing.

I was looking into service discovery options to allow me to access a service via a local IP rather than routing over the public internet. To this end I configured my service to use an internal Network Load Balancer (which I realize is only supposed to be Beta at the moment and possibly not even Beta for an EKS cluster):

- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
    labels:
      app: nginx
      run: nginx
    name: nginx-internal
    namespace: rlb-cloud
  spec:
    externalTrafficPolicy: Local
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
    selector:
      run: nginx
    type: LoadBalancer

To get this working I made a modification to the IAM permissions for the cluster (copied from https://gist.github.com/micahhausler/4f3a2ee540f5714e6dd91b4bacace3ae#file-create-cluster-sh-L30).

This created the NLB, and the DNS entry which points to the internal (VPC) address for the NLB. This DNS entry is globally distributed so I'm able to get the internal address of the NLB from my peered network which is promising.

Unfortunately this didn't work for a couple of reasons:

From within the VPC it wasn't possible to use the NLB address as we'd hit the Remote Path Filtering issue for some secondary ENIs. With the patched CNI from @lbernail branch this scenario works.
From outside the VPC (my peered network) this doesn't work because I can't seem to access the NLB address, and there doesn't seem to be a security group associated with it that I can modify. If I was able to access the NLB then I believe I'd be able to hit the pod addresses using the NLB internal address without any source NATing.

Anyways - I'm sharing here in case it's a useful thing to consider.

lbernail · 2018-05-20T09:05:05Z

@robbrockbank maybe you were missing routes to the IGW on the subnets were you have NAT gateways?

Regarding NLB I'm really not a specialist but it seems that is not possible to access NLB accross peerings: "Connectivity from clients to your load balancer is not supported over AWS managed VPN connections or VPC peering." (from https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html)
You should be able to achieve this using a Private Link: https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/endpoint-service.html

robbrockbank · 2018-05-23T17:44:36Z

@lbernail : thanks for your follow up, greatly appreciated. I'd misconfigured by routing tables so once I figured that out it all seems to be ok now, thanks for the push. Regarding the NLB, good grief I just didn't see that comment in the docs - thanks for pointing that out :-)

robbrockbank · 2018-05-23T23:45:34Z

I put up PR to make it configurable via an environment as to whether the aws-node image will install the SNAT and off-VPC rules (which seems to be the cause of the off-VPC routing issues). The idea being that SNAT for the containers would be handled via an explicitly configured NAT gateway. Not sure the approach I've taken is sensible, but happy to make some iterations on it.

IIUC, one thing that I think would make it more useful though is to be able to allocate the node IPs from a different subnet than the secondary (container) IPs. That would allow the nodes to use a routing table with a default route to an igw, and the for the containers to use a routing table with a default route to a nat gw. As it stands, configuring the EKS subnets with default route to a NAT gateway means you have to configure specific routes to an internet gateway to allow traffic to hit the nodes public IP (e.g. for SSH). (please let me know if my thinking is wrong here though)

lbernail · 2018-05-24T09:15:45Z

@robbrockbank I like the idea of using different subnets for the main host interface and additional ENIs (we use this feature with another CNI plugin: https://github.com/lyft/cni-ipvlan-vpc-k8s). But this requires modifying the logic of the plugin to identify the secondary ENIs subnet (probably using tags) and to avoid adding secondary IPs to the main interface.

robbrockbank · 2018-05-24T17:39:09Z

@lbernail - I was thinking of going further and having, I guess, 4 subnets so that you have two for primary and two for secondary - that way you still have subnets split across availability zones. I'm assuming at that point the tagging would be done as part of the cloud formation templating? Apologies if I'm talking rubbish - I'm rather new to all this so it's a bit of a steep and slow learning curve.

isz-paul · 2018-06-26T22:08:56Z

Something that's missing from several of these discussions: Why was the SNAT iptables entry introduced in the first place?

I'd like to completely get away from IP tables connection tracking if at all possible. Every tried a SYN-flood on an IP tables machine?

liwenwu-amazon · 2018-06-28T16:54:59Z

This issue should be addressed by PR #81 .

rishabh1635 · 2020-03-09T12:51:25Z

I'm experiencing a routing issue from outside of my VPC where my EKS cluster is located. My setup is as follows:

VPC A with 3 private subnets. Fourth subnet is public with NAT gateway
VPC B with VPN access.
Peering connection between the two.

VPC A houses my EKS cluster with 3 worker nodes each in a different subnet. VPC B is our existing infrastructure (different region) with VPN access.

Sometimes (not always), I'll have trouble getting a route into a pod from VPC B. Connection will timeout. Ping doesn't work either. If I ssh into one of the worker nodes in VPC A, I can route just fine into the pod.

I have confirmed this is not a ACL issue, or SG issue as I can route into other pods on the same node.

This is not confined to a single subnet.

Let me know if you need more information as I can reproduce pretty easily. I posted this question in the aws eks slack channel and they directed me to create an issue here.

Thank you!
Did You get any solution for that because currently i'm facing same issue

mogren · 2020-03-10T02:19:36Z

@rishabh1635 Which CNI version? For older versions, you need to disable SNAT completely by setting AWS_VPC_K8S_CNI_EXTERNALSNAT=true. Starting with v1.6.0 you can instead set AWS_VPC_K8S_CNI_EXCLUDE_SNAT_CIDRS=<Peered VPC CIDRs> to keep having the pods being able to call outside of the VPC.

dbenhur mentioned this issue May 15, 2018

Supply Build documentation #67

Closed

robbrockbank mentioned this issue May 23, 2018

Add environment option to disable on-node SNAT #81

Merged

liwenwu-amazon added this to the v1.1 milestone Jun 22, 2018

liwenwu-amazon closed this as completed Jun 28, 2018

mogren mentioned this issue Jul 15, 2020

setting AWS_VPC_K8S_CNI_EXTERNALSNAT=false adds latency #1087

Closed

omnibs mentioned this issue Jun 24, 2021

aws-vpc-cni-k8s vs lyft cni lyft/cni-ipvlan-vpc-k8s#74

Open

chen-anders mentioned this issue Feb 29, 2024

IPv6 containers experience connectivity issues with large simultaneous file downloads #2817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Routing Issue Outside VPC #53

Routing Issue Outside VPC #53

incognick commented Mar 29, 2018

incognick commented Mar 29, 2018

lbernail commented Mar 29, 2018 •

edited

Loading

incognick commented Mar 30, 2018 •

edited

Loading

edwize commented Mar 30, 2018

edwize commented Mar 30, 2018 •

edited

Loading

lbernail commented Mar 31, 2018 •

edited

Loading

lbernail commented Mar 31, 2018

edwize commented Mar 31, 2018

eswarbala commented Apr 4, 2018

edwize commented Apr 5, 2018

Dieler commented Apr 19, 2018

edwize commented Apr 23, 2018

robbrockbank commented May 9, 2018 •

edited

Loading

robbrockbank commented May 14, 2018 •

edited

Loading

robbrockbank commented May 16, 2018

lbernail commented May 17, 2018

robbrockbank commented May 18, 2018

lbernail commented May 20, 2018

robbrockbank commented May 23, 2018

robbrockbank commented May 23, 2018 •

edited

Loading

lbernail commented May 24, 2018

robbrockbank commented May 24, 2018

isz-paul commented Jun 26, 2018 •

edited

Loading

liwenwu-amazon commented Jun 28, 2018

rishabh1635 commented Mar 9, 2020

mogren commented Mar 10, 2020

Routing Issue Outside VPC #53

Routing Issue Outside VPC #53

Comments

incognick commented Mar 29, 2018

incognick commented Mar 29, 2018

lbernail commented Mar 29, 2018 • edited Loading

incognick commented Mar 30, 2018 • edited Loading

edwize commented Mar 30, 2018

edwize commented Mar 30, 2018 • edited Loading

lbernail commented Mar 31, 2018 • edited Loading

lbernail commented Mar 31, 2018

edwize commented Mar 31, 2018

eswarbala commented Apr 4, 2018

edwize commented Apr 5, 2018

Dieler commented Apr 19, 2018

edwize commented Apr 23, 2018

robbrockbank commented May 9, 2018 • edited Loading

robbrockbank commented May 14, 2018 • edited Loading

robbrockbank commented May 16, 2018

lbernail commented May 17, 2018

robbrockbank commented May 18, 2018

lbernail commented May 20, 2018

robbrockbank commented May 23, 2018

robbrockbank commented May 23, 2018 • edited Loading

lbernail commented May 24, 2018

robbrockbank commented May 24, 2018

isz-paul commented Jun 26, 2018 • edited Loading

liwenwu-amazon commented Jun 28, 2018

rishabh1635 commented Mar 9, 2020

mogren commented Mar 10, 2020

lbernail commented Mar 29, 2018 •

edited

Loading

incognick commented Mar 30, 2018 •

edited

Loading

edwize commented Mar 30, 2018 •

edited

Loading

lbernail commented Mar 31, 2018 •

edited

Loading

robbrockbank commented May 9, 2018 •

edited

Loading

robbrockbank commented May 14, 2018 •

edited

Loading

robbrockbank commented May 23, 2018 •

edited

Loading

isz-paul commented Jun 26, 2018 •

edited

Loading