-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix additionalSecurityGroups support for NLB #10162
Conversation
We were correctly adding the security groups to the master ASGs but identified them incorrectly.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rifelpet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @seh this should fix your most recent report |
@rifelpet: GitHub didn't allow me to request PR reviews from the following users: most, recent, report, seh, this, should, fix, your. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Thank you. Does this also require #10161 to work correctly? |
If you're using terraform output, yes you'll need both commits for a test build |
/lgtm |
Thank you. This solved the name problem, but then I ran into this: When the master ASGs try to create new instances for the master machines, they fail with an error like this:
I don't understand the complaint, because there is an NLB with that exact same name. |
Ah, I think I see at least part of the problem: In the AWS Web console, inspecting one of these ASGs, it shows that it's still associated both with the previous Classic Load Balancer and a load balancer target group. Both the load balancer and the target group have the same name. Looking at the generated Terraform configuration, the |
I think this may have to do with the manual removal of the Terraform # Accommodate the migration to using inline attachments of ASGs to
# load balancers, per the required action documented here:
# https://github.com/rdrgmnzs/kops/blob/41adf07e15b1d4204647684e78ebde8bfcd41782/docs/releases/1.19-NOTES.md#required-actions:
for attachment in $(terraform state list | grep '^aws_autoscaling_attachment\.'); do
terraform state rm "${attachment}"
done I'll try removing that workaround—leaving those resources in place—in another test to see if we can get our ASGs working again. |
That fixed that problem. Next is that the "master" security group accepts inbound HTTPS traffic only from a CIDR block that looks to be related to my VPC, but the traffic in from the (new) NLB is apparently not coming from any addresses in that block. Opening up HTTPS ingress from any IPv4 address (as a test) fixed that problem. I'm not sure if we can be any more particular about which addresses we'll see for the downstream NLB listeners opening connections to our master machines. @hakman, this sort of relates to #10142. |
@seh the security group on the master should be configured to allow 443/TCP traffic from the VPC's CIDR block(s) as well as any CIDRs defined in the cluster's What was the exact error you were experiencing that was fixed by opening up the security group to 0.0.0.0/0 ? I'm troubleshooting a connection refused issue in our end to end prow jobs for a related PR and I'm wondering if its related. |
Oh, I had never noticed that field before. We're not using it today, and had been granting access from a CIDR block for our VPN—one we were able to attach to the Classic Load Balancer, but not the NLB. Only after I posted my message earlier and went out for a long walk did it occur to me that NLB is preserving client IP addresses (since we register our targets by instance ID, rather than by IP address), so the IP addresses I needed to authorize are from our VPN-related block.
There was no error message, per se; all of my API server clients just hung trying to connect. It turned out that the NLB was passing the traffic through to any of the three master machines, but then their firewalls were blocking the connection. I'll try specifying a CIDR block for the "kubernetesAPIAccess" field and report back on how I fare. |
So far using the "kubernetesApiIAccess" field has fixed that problem. It turns out that we have another longstanding security group granting ingress to our master machines with this same CIDR block, but it's allowing port 80 instead of port 443! I presume that's a vestige of having terminated TLS previously at the Classic ELB, and not configuring use of HTTPS out the back. |
We were correctly adding the security groups to the master ASGs but identified them incorrectly.
ref: #10158 (comment)