Fix additionalSecurityGroups support for NLB #10162

rifelpet · 2020-11-03T14:23:49Z

We were correctly adding the security groups to the master ASGs but identified them incorrectly.

ref: #10158 (comment)

We were correctly adding the security groups to the master ASGs but identified them incorrectly.

k8s-ci-robot · 2020-11-03T14:24:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rifelpet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rifelpet · 2020-11-03T14:25:35Z

/cc @seh this should fix your most recent report

k8s-ci-robot · 2020-11-03T14:25:38Z

@rifelpet: GitHub didn't allow me to request PR reviews from the following users: most, recent, report, seh, this, should, fix, your.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @seh this should fix your most recent report

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

seh · 2020-11-03T14:27:26Z

this should fix your most recent report

Thank you. Does this also require #10161 to work correctly?

rifelpet · 2020-11-03T14:29:38Z

If you're using terraform output, yes you'll need both commits for a test build

hakman · 2020-11-03T14:37:29Z

/lgtm

seh · 2020-11-03T16:54:11Z

Thank you. This solved the name problem, but then I ran into this: When the master ASGs try to create new instances for the master machines, they fail with an error like this:

Launching a new EC2 instance. Status Reason: There is no ACTIVE Load Balancer named 'api-redacted-cluster-name--55chum'. Validating load balancer configuration failed.

I don't understand the complaint, because there is an NLB with that exact same name.

seh · 2020-11-03T17:00:04Z

Ah, I think I see at least part of the problem: In the AWS Web console, inspecting one of these ASGs, it shows that it's still associated both with the previous Classic Load Balancer and a load balancer target group. Both the load balancer and the target group have the same name.

Looking at the generated Terraform configuration, the aws_autoscaling_group resources each mention one entry in their "target_group_arns" attribute. I don't see any mention there of the previous Classic Load Balancer. Running terraform plan doesn't threaten any changes.

#10162-origin-release-1.19 Automated cherry pick of #10161: Move NLB's VPC CIDR security group rule logic into model #10162: Fix additionalSecurityGroups support for NLB

seh · 2020-11-03T17:46:00Z

I think this may have to do with the manual removal of the Terraform aws_autoscaling_attachment resources that was necessary when continuing to use the Classic Load Balancer:

# Accommodate the migration to using inline attachments of ASGs to
# load balancers, per the required action documented here:
# https://github.com/rdrgmnzs/kops/blob/41adf07e15b1d4204647684e78ebde8bfcd41782/docs/releases/1.19-NOTES.md#required-actions:
for attachment in $(terraform state list | grep '^aws_autoscaling_attachment\.'); do
  terraform state rm "${attachment}"
done

I'll try removing that workaround—leaving those resources in place—in another test to see if we can get our ASGs working again.

seh · 2020-11-03T18:15:28Z

That fixed that problem.

Next is that the "master" security group accepts inbound HTTPS traffic only from a CIDR block that looks to be related to my VPC, but the traffic in from the (new) NLB is apparently not coming from any addresses in that block.

Opening up HTTPS ingress from any IPv4 address (as a test) fixed that problem. I'm not sure if we can be any more particular about which addresses we'll see for the downstream NLB listeners opening connections to our master machines. @hakman, this sort of relates to #10142.

rifelpet · 2020-11-03T18:25:18Z

@seh the security group on the master should be configured to allow 443/TCP traffic from the VPC's CIDR block(s) as well as any CIDRs defined in the cluster's kubernetesAPIAccess list. The VPC CIDRs are for the NLB target groups' health checks as recommended by the AWS docs. There might be a less permissive way to allow that traffic, but we can handle that down the road.

What was the exact error you were experiencing that was fixed by opening up the security group to 0.0.0.0/0 ? I'm troubleshooting a connection refused issue in our end to end prow jobs for a related PR and I'm wondering if its related.

seh · 2020-11-03T20:59:43Z

as well as any CIDRs defined in the cluster's kubernetesAPIAccess list

Oh, I had never noticed that field before. We're not using it today, and had been granting access from a CIDR block for our VPN—one we were able to attach to the Classic Load Balancer, but not the NLB. Only after I posted my message earlier and went out for a long walk did it occur to me that NLB is preserving client IP addresses (since we register our targets by instance ID, rather than by IP address), so the IP addresses I needed to authorize are from our VPN-related block.

What was the exact error you were experiencing that was fixed by opening up the security group to 0.0.0.0/0?

There was no error message, per se; all of my API server clients just hung trying to connect. It turned out that the NLB was passing the traffic through to any of the three master machines, but then their firewalls were blocking the connection.

I'll try specifying a CIDR block for the "kubernetesAPIAccess" field and report back on how I fare.

seh · 2020-11-04T03:45:03Z

So far using the "kubernetesApiIAccess" field has fixed that problem. It turns out that we have another longstanding security group granting ingress to our master machines with this same CIDR block, but it's allowing port 80 instead of port 443! I presume that's a vestige of having terminated TLS previously at the Classic ELB, and not configuring use of HTTPS out the back.

Fix additionalSecurityGroups support for NLB

860249f

We were correctly adding the security groups to the master ASGs but identified them incorrectly.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 3, 2020

k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Nov 3, 2020

k8s-ci-robot requested review from johngmyers and KashifSaadat November 3, 2020 14:24

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2020

k8s-ci-robot assigned hakman Nov 3, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2020

k8s-ci-robot merged commit 578920e into kubernetes:master Nov 3, 2020

k8s-ci-robot added this to the v1.20 milestone Nov 3, 2020

rifelpet mentioned this pull request Nov 3, 2020

Automated cherry pick of #10161: Move NLB's VPC CIDR security group rule logic into model #10162: Fix additionalSecurityGroups support for NLB #10163

Merged

rifelpet deleted the nlb-sg branch May 5, 2021 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix additionalSecurityGroups support for NLB #10162

Fix additionalSecurityGroups support for NLB #10162

rifelpet commented Nov 3, 2020

k8s-ci-robot commented Nov 3, 2020

rifelpet commented Nov 3, 2020

k8s-ci-robot commented Nov 3, 2020

seh commented Nov 3, 2020

rifelpet commented Nov 3, 2020

hakman commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020 •

edited

Loading

rifelpet commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 4, 2020

Fix additionalSecurityGroups support for NLB #10162

Fix additionalSecurityGroups support for NLB #10162

Conversation

rifelpet commented Nov 3, 2020

k8s-ci-robot commented Nov 3, 2020

rifelpet commented Nov 3, 2020

k8s-ci-robot commented Nov 3, 2020

seh commented Nov 3, 2020

rifelpet commented Nov 3, 2020

hakman commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 3, 2020 • edited Loading

rifelpet commented Nov 3, 2020

seh commented Nov 3, 2020

seh commented Nov 4, 2020

seh commented Nov 3, 2020 •

edited

Loading