Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't create placeholder A record on IPv6 clusters without API LB #12665

Closed
wants to merge 1 commit into from

Conversation

rifelpet
Copy link
Member

@rifelpet rifelpet commented Nov 2, 2021

dns-controller updates this record to the node IPs, but in this case nodes wont have any IPv4 addresses so dns-controller never updates the record from the placeholder value, causing validate cluster to fail (even if the AAAA placeholder is updated)

I noticed this in this job (ref: #12657 (comment))

kops validate cluster fails with the error message containing the IPv4 placeholder IP which shouldn't be relevant to an IPv6 cluster:

The dns-controller Kubernetes deployment has not updated the Kubernetes cluster's API DNS entry to the correct IP address. The API DNS IP address is the placeholder address that kops creates: 203.0.113.123. Please wait about 5-10 minutes for a master to start, dns-controller to launch, and DNS to propagate. The protokube container and dns-controller deployment logs may contain more diagnostic information. Etcd and the API DNS entries must be updated for a kops Kubernetes cluster to start.

kops delete cluster reports cleaning up these records, notice the A record for the API name:

route53-record		api.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.								ZEMLNXIIWQ0RV/A/api.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.
route53-record		api.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.								ZEMLNXIIWQ0RV/AAAA/api.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.
route53-record		api.internal.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.							ZEMLNXIIWQ0RV/AAAA/api.internal.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.
route53-record		kops-controller.internal.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io.					ZEMLNXIIWQ0RV/AAAA/kops-controller.internal.e2e-09c66f4ca2-5e791.test-cncf-aws.k8s.io. 

dns-controller logs report adding AAAA record but not any A record.

Neither the apiserver pod nor the nodes have any IPv4 addresses.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 2, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from rifelpet after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dns-controller updates this record to the node IPs, but in this case nodes wont have any IPv4 addresses so dns-controller never updates the record from the placeholder value, causing `validate cluster` to fail (even if the AAAA placeholder is updated)
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 2, 2021
@johngmyers
Copy link
Member

In this case nodes are dual-stack, so have an IPv4 address. This will cause delays until the SOA TTL expires when the kops client only has IPv4 connectivity to the control plane.

@rifelpet
Copy link
Member Author

rifelpet commented Nov 2, 2021

ah, so the problem is that the IPv4 addresses aren't being added to the node objects? I see the AWS CCM isnt running and the job's cluster spec doesn't have external CCM enabled. Perhaps we should default to enabling external CCM on IPv6 AWS clusters, given that it is required for node ipam.

@johngmyers
Copy link
Member

The problem, as I noted in the PR, is the CNI daemonset pod refused to start.

A problem with enabling the external CCM was recently fixed in #12658.

@johngmyers
Copy link
Member

If we ever support a cloud provider with IPv6-only instances, then the placeholder creation code will need to know whether or not the control plane nodes are going to be dual-stack.

@rifelpet
Copy link
Member Author

rifelpet commented Nov 2, 2021

That makes sense to me, thanks for the clarification

/close

@k8s-ci-robot
Copy link
Contributor

@rifelpet: Closed this PR.

In response to this:

That makes sense to me, thanks for the clarification

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants