Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Code: NatGatewayLimitExceeded - Even with a higher limit than the error reports. #560

Closed
chs-bnet opened this issue Feb 19, 2019 · 10 comments

Comments

@chs-bnet
Copy link

chs-bnet commented Feb 19, 2019

What happened?
Occasionally getting "limit exceeded" errors when the limit isn't actually exceeded.

Error:

AWS::EC2::NatGateway/NATGateway: CREATE_FAILED – "Performing this operation would exceed the limit of 5 NAT gateways (Service: AmazonEC2; Status Code: 400; Error Code: NatGatewayLimitExceeded; Request ID:  2eda19f-18ef-49c0-869e-52b14f558997)"

What you expected to happen?
The limits for the account where this is being run have been increased to 20. At the time of EKS creation, there are only 7 active NAT Gateways. There shouldn't be any error.

How to reproduce it?
It doesn't happen every time, but maybe 1/3 of the time and it seems random. But to reproduce just do eksctl create cluster in an account with a limit higher than 5 NAT Gateways.

Versions

$ eksctl version
version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.1.20-rc.3"}

$ uname -a
Linux chs-vm 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Distribution: Ubuntu 16.04.5 LTS

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Using aws-iam-authenticator. Version doesn't output a version number, but the latest per installation instructions from Amazon as of about 2 weeks ago.

Logs
Since it's random, it's hard to collect the logs. But I will update the issue with them when I can.

@errordeveloper
Copy link
Contributor

errordeveloper commented Feb 20, 2019

Thanks for reporting this! It's very unfortunate... We've been discussing adding a NAT gateways per-AZ, so that AZ failure doesn't make privates subnets in other AZs cut off from the world (#392), but looks like single NAT gateways is a little already problematic. Are you using private subnets at all? I wonder if we can eventually find a solution to provision NAT gateways only when a nodegroup gets deployed in a the private subnets.

Has this happen continuously for a little while and eventually stopped being an issue? If so, perhaps the limit have eventually consistent semantics?

@chs-bnet
Copy link
Author

I spun up another cluster yesterday without issues, so I couldn't collect logs. I might try to bring up another one today and see if I can get it to error again. Like I said, it's only about 30% of the time.

Up until yesterday I was only creating clusters using private networking. But the first time I tried to spin one up without --node-private-networking, it still gave the NAT gateway error. So I get the error regardless of private or public networking enabled.

@chs-bnet
Copy link
Author

Ok, Happened to me the first time:

https://gist.github.com/chs-bnet/fd9a3c66b6965a9faa05d089cd925b61

@whereisaaron
Copy link

Seems like this spurious CF limit error would have to be a CF issue rather the eksctl?

I would actually suggest splitting create cluster into create network and create cluster, even if create cluster has the default behavior to run create network first. create network would make the VPC/subnets/routing tabs and gateways, and export resource name that a cluster needs.

That way you can deploy three clusters in a VPC with shared gateways with

create network --name foo
create cluster --name prod --network-stack=foo
create cluster --name preprod --network-stack=foo
create cluster --name uat --network-stack=foo

And you make sensible/efficient use of NAT Gateways.

And you still have the simple create cluster (just don't specify --network-stack). Only difference is that it would create two CF stacks, one network, one cluster. And using the simple case wouldn't preclude you adding in a second cluster later.

@errordeveloper
Copy link
Contributor

errordeveloper commented Feb 21, 2019 via email

@whereisaaron
Copy link

I'm greatly in favor of separate, non-nested, stacks with suitable export/imports. Exports are globally named, but nothing a configurable prefix in eksctl can't address.

What is the trade-off around IPs with shared network stacks?

@errordeveloper
Copy link
Contributor

errordeveloper commented Feb 22, 2019 via email

@chs-bnet
Copy link
Author

Just noticed this was tagged as awaiting information. This issue is still an ongoing problem for us. Is there any more information that you actually need, or was my previous log sufficient?

@errordeveloper
Copy link
Contributor

@chs-bnet it's clear to me that this is ongoing, thanks for clarifying that. Did you try to increase the limit to something much higher? To be honest, all we can do here is provide an option to disable NAT Gateway, and/or offer Egress-only NAT Gateway... The real issue is completely outside of our control, unless I'm missing something. As far as I'm aware, there is no API for limits. And overall, this sounds like a flaky behaviour on AWS side, but I'm surprised that only you are seeing this particular behaviour.

To be clear, private subnets and NAT Gateway are created disregarding of whether --node-private-networking was used or not, because we don't want to limit you choice of network topology early on, so we create all what's needed, so that you will be able to create either public or private nodegroups in the future, as you are allowed to have as many different nodegroups as you like.

So would an option to use no NAT Gateway or an Egress-only NAT Gateway be sufficient for you?

@errordeveloper
Copy link
Contributor

I'm gonna close it, as this will be addressed as part of #694 and #872.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants