-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control plane node join #463
Control plane node join #463
Conversation
Hi @ashish-amarnath. Thanks for your PR. I'm waiting for a kubernetes-sigs or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
46ae5c7
to
20493a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the lead on this! Left some early comments.
pkg/apis/awsprovider/v1alpha1/awsclusterproviderstatus_types.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a little more discussion around why a counter is needed would help me understand this code more
Current status:Two scenarios of HA control plane support have been tested. ToDo:
Developer notes:TestingTest 1
Test 2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely some good progress. Once the style errors are fixed I'll make another pass. Let me know when it's ready for testing and I'll run it through its paces.
@@ -55,6 +55,49 @@ nodeRegistration: | |||
EOF | |||
|
|||
kubeadm init --config /tmp/kubeadm.yaml | |||
|
|||
tar -cvzf /etc/kubernetes/pki.tar.gz /etc/kubernetes/pki/* | |||
kubectl -n kube-system --kubeconfig /etc/kubernetes/admin.conf create secret generic kubeadm-certs --from-file=/etc/kubernetes/pki.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting approach for managing the secrets, but it would probably be better if we took a different approach. I can see a couple of different ways of solving this without adding a secret post-init:
- pre-generate the etcd ca keypair, front-proxy ca keypair, and sa(service account) keypair in the cluster controller and store in the Cluster ProviderSpec
- inject them into the init config like we are currently doing with the CA cert/key
- pre-write the keypairs to the filesystem before calling kubeadm join
- if other certificates are needed prior to calling kubeadm join, pregenerate them using kubeadm phases (kubeadm join should only really require the CAs, it would be a bug otherwise)
- Instead of modifying the cluster actuator, it would be possible to fetch the required certificates using the kubernetes client from within the machine actuator, then inject the certificates using the init script on the host prior to running kubeadm join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like the simplicity of this approach. can you elaborate on why you think it would be better if we took a different approach?
i'm guessing security issues? secrets aren't actually secret and it would be far too easy to get these secrets and do bad things. But it's not that different if we store the values in the cluster provider spec. That's still plaintext secret storage in etcd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect operators to use envelope encryption for secrets in their bootstrap / manager clusters, so it'll be justifiable to store ca certificates as secrets, less so as values in the provider spec.
We can't pre-generate the non-authoritative certs yet because we don't know the IP addresses of the nodes beforehand, and this is particularly important for etcd, but reasonable to do so for the certificate authorities and service account key-pair.
For a solution that will also support autoscaling groups, the ideal way would be use the instance identity document as a host attestation mechanism, where a verifiable CSR is made, that the controller can sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, there are a couple of different reasons why I'm suggesting adding it to the provider spec:
- It automatically provides an avenue for users to override the CA and SA keypairs to use custom values
- While we are embedding the certificates in the spec today, we can later update the spec to allow for either embedding or referencing an external source at a later point
I want to make sure that we are considering UX when we start to look at more secure private key storage, by allowing direct embedding of the cert material, we avoid having to pre-generate any additional secrets as part of the workflow and we the spec is updated to support remote references for secure material (k8s secrets or leveraging something AWS native), we can have the actuator move the cert material to the proper place as part of the workflow, or even have an admission webhook that does the job to avoid ever storing the cert material in etcd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with this approach. So we need to change the spec a bit now, at least, to hold all the key pairs we need for the HA control plane?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the UX and the extendability aspect. However, IMO, we have a dependency on kubeadm here for generating the certs on when the first controlplane node runs the init. From reading the kubeadm docs, there seems to be a way to skip cert generation and run the other "phases". By doing this we will be deviating from the standard kubeadm init flow. Which, IMO, is OK as long as we call that out for UX.
@ashish-amarnath If the keypair files already exist on disk in the default locations that kubeadm would use for creating them and are not close to expiration they will be picked up and used by kubeadm without having to run phases.
Also, kubeadm will start handling the cert distribution for controlplane join in v.1.14. More info kubernetes/kubeadm#1200
With this in mind and we aren't GA yet, I think the cert distribution fix I have should be a stop-gap till we go to 1.14. WDYT?
Even if kubeadm handles the certificate copying for v1.14+ we would still benefit from having the ability for users to specify custom keypairs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decoupled the certs into their individual secrets so that they can be each updated independent of the other secrets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 thanks for the reasoning @detiber. totally agree that changing the spec is the right way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I propose that we keep the spec change out of this PR and get the controlplane machine join scenario worked out. I'll follow this PR with the spec change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #492 to address this.
99fed99
to
0f18c7f
Compare
/ok-to-test |
@ashish-amarnath: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
* pass kubeconfig into joining controlplane node userdata * use kubeconfig tofetch kubeadm-certs secret * create kubeadm generated certs as a secret in kube-system as kubeadm-certs * fetch kubeadm-certs secret on controlplane join machine and unpack it * run kubeadm join post kubeadm-cert secrets unpacking
c32aab2
to
66ecf7f
Compare
/ok-to-test |
/lgtm |
let's type better /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ashish-amarnath, chuckha The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Implements support for control plane node join
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #413
Special notes for your reviewer:
Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
Release note: