Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws: create fewer NAT gateways #1450

Closed
wants to merge 1 commit into from
Closed

Conversation

eparis
Copy link
Member

@eparis eparis commented Mar 22, 2019

Instead of making 1 nat gateway for every subnet, make a nat gateway in 3
subnets and use those in all subnets

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 22, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eparis
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: aaronlevy

If they are not already assigned, you can assign the PR to them by writing /assign @aaronlevy in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

resource "aws_nat_gateway" "nat_gw" {
count = "min(${local.new_az_count}, 3)"
allocation_id = "${aws_eip.nat_eip.*.id[count.index]}"
subnet_id = "${aws_subnet.public_subnet.*.id[count.index]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move these public-subnet resources into vpc-private.tf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they are private subnet resources, not public. the move should happen irrespective of this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they are private subnet resources, not public...

Ah, because while this resource lives in the public subnet, its purpose is to service instances in the private subnets.

the move should happen irrespective of this PR

Personally I'd rather collapse vpc-private.tf and vpc-public.tf together. The unified file would only be ~100 lines. I'd work up a PR, but there are other PRs in flight touching this code and I don't think we could get this file-rename landed before the coming freeze.

@wking
Copy link
Member

wking commented Mar 22, 2019

Is this a trade-off between cost/limits (favoring NAT/EIP sharing) and external-net latency (favoring per-zone-with-instances NATs)?

Instead of making 1 nat gateway for every subnet, make a nat gateway in 3
subnets and use those in all subnets
@openshift-ci-robot
Copy link
Contributor

@eparis: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws 9a738f7 link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@eparis
Copy link
Member Author

eparis commented Mar 22, 2019

it gets the eip limit to 3, which co-incide with the 3 subnets we use out of the box. It would create some cross az faults for potential workers. If us-east-1a goes out you'll lose internet access from nodes in 1a and 1d

@wking
Copy link
Member

wking commented Mar 27, 2019

I've pushed #1481 with an alternative approach that only creates subnet infrastructure in a given zone when we have instances configured for that zone. So if for some reason you wanted a six-zone cluster in us-east-1, you'd still need six EIPs (vs. this PR's three). But all nodes are getting out through a NAT in their own subnet (vs. this PR's cross-zone traffic as mentioned in your last comment).

@wking
Copy link
Member

wking commented Mar 28, 2019

#1481 allows users to opt-in to fewer NAT gateways, and #1487 is floating to make fewer NAT gateways the default. I think that means we can close this NAT-gateway-pooling approach, but feel free to re-open or comment if you disagree :).

@wking wking closed this Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants