Skip to content
This repository has been archived by the owner on Jan 5, 2022. It is now read-only.

Add a bastion host and place instances + kube apiserver behind it #1

Merged
merged 25 commits into from
Mar 23, 2018

Conversation

bendrucker
Copy link

@bendrucker bendrucker commented Mar 6, 2018

This adds private networking to our typhoon fork. It makes the following changes relative to upstream typhoon:

  • Private subnet in each availability zone (including egress gateways for ipv4 + 6)
  • Moves workers into the private subnet
  • Moves controllers and the apiserver load balancer into the private subnet
  • Launches a bastion instance in the public subnet to allow administrative access
  • Configures terraform to run ssh provisioning steps via the bastion

This change set is the minimum diff from upstream to achieve private networking. Next steps:

  • Lock down security groups. Because everything was public they allow access to certain ports from 0.0.0.0/0. We can restrict them to our VPC CIDR block.
  • Rebase against upstream. Struggling with the nuances of NLBs ate up time I planned to use to try to sync up with typhoon upstream. We now have fairly significant diffs and I'd like to try to rebase to get a sense for how it'll go.

@bendrucker
Copy link
Author

Here is the subnet layout:

name az cidr
bastion-test-private-0 us-east-1a 10.0.128.0/20
bastion-test-private-1 us-east-1b 10.0.144.0/20
bastion-test-private-2 us-east-1c 10.0.160.0/20
bastion-test-private-3 us-east-1d 10.0.176.0/20
bastion-test-private-4 us-east-1e 10.0.192.0/20
bastion-test-private-5 us-east-1f 10.0.208.0/20
bastion-test-public-0 us-east-1a 10.0.0.0/20
bastion-test-public-1 us-east-1b 10.0.16.0/20
bastion-test-public-2 us-east-1c 10.0.32.0/20
bastion-test-public-3 us-east-1d 10.0.48.0/20
bastion-test-public-4 us-east-1e 10.0.64.0/20
bastion-test-public-5 us-east-1f 10.0.80.0/20

Changes still coming:

  • Launch bastion in ASG behind load balancer

@bendrucker bendrucker changed the title WIP: Add a bastion host and place instances + kube apiserver behind it Add a bastion host and place instances + kube apiserver behind it Mar 15, 2018
@bendrucker
Copy link
Author

@eladidan Complete with a load balancer/ASG now, with desired,min,max=1 (configurable). I can walk you through how I test this as well as leave a stack up for you to look at. The name is bastion-test and it's in the k8s-playground sub-account in us-east-1.

* copy-secrets hangs for ~1 min waiting for an instance
* it succeeds but it's more clear to wait for the asg before trying
Copy link

@eladidan eladidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bendrucker very superficail review. Let's meet and discuss this.
I'm concerned about how we're going to manage keys and users, and want to make sure we don't regress in terms of functionality with what we have with tunnelbox, though do not want to add tons of scope here - I'd like to align on long term vision here.

lifecycle {
# override the default destroy and replace update behavior
create_before_destroy = true
ignore_changes = ["image_id"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid applying updates via AMI changes. Otherwise a new AMI means terraform will want to destroy everything and re-create it. We can (and should) automatically apply patches to existing instances instead of updating to new AMIs as a general practice.

Typhoon recommends https://github.com/coreos/container-linux-update-operator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and should) automatically apply patches to existing instances instead of updating to new AMIs as a general practice.

this is a decision that goes way beyond the scope of this PR, and has not been our policy up until today. Let's not get in the habit of making policy changes snuck in as implementation details in a PR.

The bastion instance is not part of the kubernetes cluster, so I don't see how the operator reference is relevant?

AMI updates are infrequent so I don't think doing a rolling update of the ASG (bringing instances down 1 by 1 and creating new ones) is bad practice, and makes infrastructure more immutable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More info, per our discussion just now. This is consistent with how the controllers and workers are managed in typhoon. The data source will return the latest stable AMI.

Typhoon tells terraform to ignore these changes in order to not require a destroy/recreate any time there's an update. There's no way to defer these changes—any terraform apply will want to make them and it's easy to get be forced into recreating resources when you want to apply another change.

We don't have to do in place updates either. We can:

  1. Deploy a new cluster when we want to use new AMIs. Changes are only ignored for the given instance of a module.
  2. Create a variable (map) containing AMI IDs per region and then updating that manually. That's the most consistent with aether/CloudFormation.

Like some of the other things you've mentioned, totally valid, but I think we should land private networking and come back to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think doing a rolling update of the ASG (bringing instances down 1 by 1 and creating new ones) is bad practice

CloudFormation provides additional functionality for updating autoscaling groups (including rolling updates). Terraform chose not to try to replicate this, more here: hashicorp/terraform#1552

One option (probably mentioned in that thread somewhere) is to use terraform to create a CloudFormation stack solely to take advantage of UpdatePolicy for ASGs.


lifecycle {
# override the default destroy and replace update behavior
create_before_destroy = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_before_destroy overrides the default terraform lifecycle for resource replacement. Normally terraform will destroy the old resource and then create a new one. Setting that attribute means it will instead launch the replacement resource before destroying the previous one.

In this case, a change requiring replacement of the ASG will be rolled out by creating a new ASG and then destroying the old one once the new one is created.

}

resource "aws_launch_configuration" "bastion" {
image_id = "${data.aws_ami.coreos.image_id}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is not running kubernetes or docker, maybe use AWS linux AMI?

Copy link
Author

@bendrucker bendrucker Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means using different entirely different configuration strategy than the other machines. Most importantly I copied the container linux configs from controllers/workers. Configuring AWS linux is different enough that I really don't think it's worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do.
We would want different auditing tools, and different dependencies installed on bastion host.
I understand the extra scope here, so willing to push back to different PR, but since purpose of this instance is vastly different, and is publically accessible over ssh (via NLB) then we should specialize the instance to that purpose

cidr_blocks = ["0.0.0.0/0"]
}

egress {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed? isn't this default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In AWS yes, but terraform overrides that (which I prefer)

hashicorp/terraform#1765

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know, thanks

name = "${format("bastion.%s.%s.", var.cluster_name, var.dns_zone)}"
type = "A"

# AWS recommends their special "alias" records for ELBs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify comment, LB not ELB

subnet_id = "${element(aws_subnet.public.*.id, count.index)}"
}

resource "aws_subnet" "private" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hulbert please review as well

@@ -52,6 +52,60 @@ resource "aws_subnet" "public" {
resource "aws_route_table_association" "public" {
count = "${length(data.aws_availability_zones.all.names)}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define ${length(data.aws_availability_zones.all.names)} as local

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated a few times and I'm looking to minimize the diff here. Similar to another comment I'd like to keep private networking and general refactoring/modifications to typhoon separate.

instance_port = 443
instance_protocol = "tcp"
# Network Load Balancer for apiservers
resource "aws_lb" "apiserver" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"apiserver" could create a lot of confusion with our actual API. Let's maybe rename to "k8s_apiserver" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely possible but I think we need to save this discussion. I'd rather wait on changing other bits of typhoon that don't have to do with private networking if we have concerns about naming confusion.

}


resource "aws_eip" "nat" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nat?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may need the EIP to explicitly depend on the IGW

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nat?

Can you expand?

I think you may need the EIP to explicitly depend on the IGW

Backwards (gateways use an EIP and need it allocated first). The egress gateway is actually for ipv6 and the NAT gateway is for ipv4.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand?

why name the elastic ip nat?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backwards (gateways use an EIP and need it allocated first). The egress gateway is actually for ipv6 and the NAT gateway is for ipv4.

from my understanding an epi address is accessible via the igw and not the other way around

Copy link
Author

@bendrucker bendrucker Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Because it's going to be associated with the NAT gateway. I don't need to reference it anywhere else. The NAT gateway references it as allocation_id and that "claims" it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missing a word up there. The NAT gateway consumes the EIP. The Internet gateway allows instances in the public subnet (including the NAT) to access the internet.

I believe the confusing bit is that the internet gateway provides NAT for instances in the public subnet. And the NAT gateway and egress-only internet gateway provide routing for instances within the private subnet.


resource "aws_egress_only_internet_gateway" "egress_igw" {
vpc_id = "${aws_vpc.network.id}"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline

@bendrucker
Copy link
Author

I want to make sure we don't regress in terms of functionality with what we have with tunnelbox

I'd love to have tunnelbox provide the primary day to day egress for engineers connecting to the cluster. Definitely a discussion for tomorrow. Tunnelbox works great for port forwarding but it doesn't seem straightforward to use it as a jump host for SSH. I think that would mean handling SSH certificate auth on the instances and getting replicating the complicated interactions with vault that tunnelbox relies on.

Copy link

@eladidan eladidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bendrucker this looks awesome.
Assuming that the network diagram will go in separate PR, this looks gtg!

* using indented arrays means the template call means matching indentation
@bendrucker
Copy link
Author

@eladidan Tested multiple keys and caught a bug there. The yaml file is indented so joining over \n- isn't enough. Luckily a later step reads the yaml (and I believe converts it to JSON for actual use) and so terraform plan failed because of the indentation issue. Swapped over to [] and tested that the changes are considered valid and would trigger a rebuild of all the instances. We can double check by having you connect as core@ after I rebuild the playground cluster.

@bendrucker bendrucker merged commit be283d4 into master Mar 23, 2018
@bendrucker bendrucker deleted the bastion-host branch March 23, 2018 22:16
@bendrucker bendrucker restored the bastion-host branch March 28, 2018 21:35
@erikbryant erikbryant deleted the bastion-host branch October 8, 2020 16:37
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants