-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Routing Issue Outside VPC #53
Comments
FYI @bchav |
@incognick : this will happen when the pod IP is on a secondary ENI Plugin (version 0.1.4) cannot work across VPC peering (see issue: #44).
=> it works for incoming traffic thanks to conntrack but you lose POD IP when traffic is sent from a pod
=> So incoming traffic is dropped by reverse path filter (example with pod with IP 172.16.0.100 on ENI 2 (ens6) and an instance in second VPC with IP 172.17.0.200):
Today, we use a patched version of the image (not really something that I can include in a PR because I simply removed the call to the function setting up the rule and NAT, but I'm happy to discuss it) |
@lbernail Thanks for the response. Hopefully this can be address soon! |
@lbernail, I've been dealing with a similar issue with routing over my VPN from VPC. POD's with IP's from Primary ETH0 pool work fine to office network (172.33.x.x <->10.10.x.x. ) Traffic from pod's using secondary interface work fine POD->office, but not office->POD. The issue is office->POD comes in on ETH1 but out ETH0. Id like to discuss "I simply removed the call to the function setting up the rule and NAT" |
@lbernail I got my POC working by doing the following directly on my Node, but would like to have the plug-in fixed to automatically apply to new Nodes and in a flexible way: SOURCE/DEST check on ENI's: REVERSE PATH FILTERING was already off (zero): DELETE SNAT RULE: DELETE IP RULE: I was then able to CURL IP's from both the ETH0 and ETH1 ENI's and POD<->office over VPN worked. |
@edwize : yes this current limit applies to any traffic outside of the VPC CIDR (so peered VPC and VPN connections or Directconnect links) A quick note on what you need: once you remove the IP rule (ip rule del prio 1024) you don't need to disable rp_filter (if it is enabled) or source-dest check because traffic from PODs with IP on a secondary ENIs will use the proper ENI, thanks to 1536 priority rules added by the plugin such as:
With route table 2 forcing traffic to ENI 2 and 3 through ENI 3 (in my case primary ENI is ens5 and ens6 and ens7 are ENI 2 and 3):
Bear in mind that if you do this, you can't use nodes in public subnets (with public IP addresses on the primary interface) because the pods won't have public IPs associated with pod IPs so traffic will not be NATed to a public IP by the Internet Gateway. It is not an issue in our case because we run our cluster in private subnets only. I'll create a quick branch with the fix we currently use so you can test it if you want. |
@edwize : you can build a custom image from this branch: https://github.com/lbernail/amazon-vpc-cni-k8s/tree/lbernail/disable-nat-rule It contains 2 additional commits compared to master:
|
@lbernail Thanks for the branch, and advisement. I assume this project lags behind internal EKS work, because the lack of VPN and VPC-VPC support is surprising. |
Really appreciate the discussion here. We are planning to add a flag that disables the NATing to support the scenarios as discussed here. |
@eswarbala That would be appreciated. I was able to use @lbernail branch with the single NAT change to build a custom container and it deployed successfully. I'm using KOPS, and found that the "amazon-k8s-cni:0.1.1" container was hard coded, so now I have a custom build of that project too. Whee! |
@edwize I haven't worked with Go yet but would really like to build my own custom container from that branch. Can you please provide some guidance on how to build this project? |
@Dieler I hadn't seen Go either, but I found setting up Go on my MAC may not have created the exact environment for recompiling AWS's CNI. However, I did find that the Kubernete's project included a Dockerized build environment with all the tools and libraries, so I hacked together this not so pretty method on an Ubuntu 16.04 server:
|
I hit this same issue today where I've set up a customer gateway/VPN to another network. I'm using BGP to advertise routes between AWS and my other network. Whilst I am able successfully route from my AWS EKS pods to my remote network, the SNAT-ing of the pod IP is causing other issues on my remote node (in particular setting up policy rules using Calico where I am assuming the source address to be the Pod IP). @eswarbala : You mention adding a flag to disable the NAT-ing. I was wondering what that might look like in terms of API and behavior, would it be a simple "disable always" which would omit adding the SNAT rules? I've also hit the RPF check issue as well - seemingly for some secondary ENIs but not for others. |
@edwize: Regarding building on your Mac. I was able to get this working without any additional changes. You'll need to tell the compiler to generate a linux binary. Setting
|
@eswarbala : I was playing around with removing the SNAT iptables rule and the VPC ip routing rule. That works great for my cluster to cluster communication but as discussed in this thread means I can't access the internet from my AWS EKS pods. I thought it might be sufficient to add a NAT gateway to my subnets, but couldn't seem to get that working. I was wondering - would you expect that configuring a NAT gateway would cover the case where we are disabling the SNAT, and if so, are there any pointers you could give on how to set it up? |
@robbrockbank It should work with NAT gateways (this is what we did). Maybe you are a missing a default route to your nat gateways? |
@Ibernail: Thanks for the follow-up, I'll give it another try - I figured it might just be a misconfiguration on my part, it's good to know that it should work! I had a default route to my NAT gateway, but presumably I hadn't set up my internet gateway correctly. I have another follow up, which may not really belong here, but I think is useful to overall discussion of off-VPC routing. I was looking into service discovery options to allow me to access a service via a local IP rather than routing over the public internet. To this end I configured my service to use an internal Network Load Balancer (which I realize is only supposed to be Beta at the moment and possibly not even Beta for an EKS cluster):
To get this working I made a modification to the IAM permissions for the cluster (copied from https://gist.github.com/micahhausler/4f3a2ee540f5714e6dd91b4bacace3ae#file-create-cluster-sh-L30). This created the NLB, and the DNS entry which points to the internal (VPC) address for the NLB. This DNS entry is globally distributed so I'm able to get the internal address of the NLB from my peered network which is promising. Unfortunately this didn't work for a couple of reasons:
Anyways - I'm sharing here in case it's a useful thing to consider. |
@robbrockbank maybe you were missing routes to the IGW on the subnets were you have NAT gateways? Regarding NLB I'm really not a specialist but it seems that is not possible to access NLB accross peerings: "Connectivity from clients to your load balancer is not supported over AWS managed VPN connections or VPC peering." (from https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html) |
@lbernail : thanks for your follow up, greatly appreciated. I'd misconfigured by routing tables so once I figured that out it all seems to be ok now, thanks for the push. Regarding the NLB, good grief I just didn't see that comment in the docs - thanks for pointing that out :-) |
I put up PR to make it configurable via an environment as to whether the aws-node image will install the SNAT and off-VPC rules (which seems to be the cause of the off-VPC routing issues). The idea being that SNAT for the containers would be handled via an explicitly configured NAT gateway. Not sure the approach I've taken is sensible, but happy to make some iterations on it. IIUC, one thing that I think would make it more useful though is to be able to allocate the node IPs from a different subnet than the secondary (container) IPs. That would allow the nodes to use a routing table with a default route to an igw, and the for the containers to use a routing table with a default route to a nat gw. As it stands, configuring the EKS subnets with default route to a NAT gateway means you have to configure specific routes to an internet gateway to allow traffic to hit the nodes public IP (e.g. for SSH). (please let me know if my thinking is wrong here though) |
@robbrockbank I like the idea of using different subnets for the main host interface and additional ENIs (we use this feature with another CNI plugin: https://github.com/lyft/cni-ipvlan-vpc-k8s). But this requires modifying the logic of the plugin to identify the secondary ENIs subnet (probably using tags) and to avoid adding secondary IPs to the main interface. |
@lbernail - I was thinking of going further and having, I guess, 4 subnets so that you have two for primary and two for secondary - that way you still have subnets split across availability zones. I'm assuming at that point the tagging would be done as part of the cloud formation templating? Apologies if I'm talking rubbish - I'm rather new to all this so it's a bit of a steep and slow learning curve. |
Something that's missing from several of these discussions: Why was the SNAT iptables entry introduced in the first place? I'd like to completely get away from IP tables connection tracking if at all possible. Every tried a SYN-flood on an IP tables machine? |
This issue should be addressed by PR #81 . |
|
@rishabh1635 Which CNI version? For older versions, you need to disable SNAT completely by setting |
I'm experiencing a routing issue from outside of my VPC where my EKS cluster is located. My setup is as follows:
VPC A with 3 private subnets. Fourth subnet is public with NAT gateway
VPC B with VPN access.
Peering connection between the two.
VPC A houses my EKS cluster with 3 worker nodes each in a different subnet. VPC B is our existing infrastructure (different region) with VPN access.
Sometimes (not always), I'll have trouble getting a route into a pod from VPC B. Connection will timeout. Ping doesn't work either. If I ssh into one of the worker nodes in VPC A, I can route just fine into the pod.
Let me know if you need more information as I can reproduce pretty easily. I posted this question in the aws eks slack channel and they directed me to create an issue here.
Thank you!
The text was updated successfully, but these errors were encountered: