Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selfhosting pivoting fails when using --store-certs-in-secrets #1281

Closed
fabriziopandini opened this issue Nov 27, 2018 · 18 comments · Fixed by kubernetes/kubernetes#72478 or kubernetes/kubernetes#72727
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Milestone

Comments

@fabriziopandini
Copy link
Member

kubeadm alpha selfhosting pivot (kubeadm v1.13.0-beta.2) fails when invoked with --store-certs-in-secrets with the following error message:

[pivot] pivoting cluster to self-hosted
[self-hosted] Created TLS secret "ca" from ca.crt and ca.key
[self-hosted] Created TLS secret "apiserver" from apiserver.crt and apiserver.key
[self-hosted] Created TLS secret "apiserver-kubelet-client" from apiserver-kubelet-client.crt and apiserver-kubelet-client.key
[self-hosted] Created TLS secret "sa" from sa.pub and sa.key
[self-hosted] Created TLS secret "front-proxy-ca" from front-proxy-ca.crt and front-proxy-ca.key
[self-hosted] Created TLS secret "front-proxy-client" from front-proxy-client.crt and front-proxy-client.key
[self-hosted] Created secret for kubeconfig file "scheduler.conf"
[self-hosted] Created secret for kubeconfig file "controller-manager.conf"
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-apiserver
timed out waiting for the condition
@fabriziopandini fabriziopandini added kind/bug Categorizes issue or PR as related to a bug. area/self-hosting labels Nov 27, 2018
@fabriziopandini fabriziopandini added this to the v1.14 milestone Nov 27, 2018
@timothysc timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 27, 2018
@timothysc
Copy link
Member

I'm fine with waiting till 1.14 on this one.

@andrewrynhard
Copy link

I thought this feature was being removed? The issue is likely due to kubernetes/kubernetes#61322.

@fabriziopandini
Copy link
Member Author

@andrewrynhard thanks for pointing this out!

I thought this feature was being removed?

self-hosting was removed from kubeadm init and kubeadm upgrade workflows (both of them in some way not working properly), but it was agreed to leave an alpha command with the pivoting logic that you can call after init; however, be aware that once the cluster is turned to self-hosting you are on your own (e.g. for solving checkpointing / cold restart).

@bart0sh
Copy link

bart0sh commented Dec 27, 2018

It looks like api server can't start as etc-client certificates are not created/copied:

F1227 16:01:52.237352       1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true 0xc000884120 <nil> 5m0s 1m0s}), err (open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory)

What I don't understand is why those certificates are not needed when -store-certs-in-secrets is not used.

@neolit123, @fabriziopandini any ideas?

@fabriziopandini
Copy link
Member Author

fabriziopandini commented Dec 29, 2018

@bart0sh
That's the bug to be fixed. As far as I know when TLS was added to etcd, --store-certs-in-secrets was never updated accordingly. There was a PR for this kubernetes/kubernetes#61323, but as you can see it never landed

So

  • without store certs in secrets, self-hosting works, because deamonset/pods uses existing certificates on disk
  • with store certs in secrets, self-hosting doesn't works, because deamonset/pods didn't find all the necessary certificates in secrets

@bart0sh
Copy link

bart0sh commented Dec 29, 2018

@fabriziopandini Thank you for the explanations. I tried to add all etcd certificates, but generating secrets for them fails as their names contain '/', e.g. "etcd/ca". Changing names didn't work either as probably some other piece[s] of this puzzle require names with slashes. I'll dig deeper into this. Any ideas on how to better solve this are appreciated.

@fabriziopandini
Copy link
Member Author

fabriziopandini commented Dec 29, 2018

@bart0sh I see your problem.
kubeadm creates one secret for each cert, and this requires changing the name for certs under etcd/.
If I'm right, this should be done here

but this is not enough, it is necessary to the corresponding volume projection that places the cert in the expected place as well. If I'm right, this should be done here

PS. pay attention to external etcd mode vs local etcd mode

@bart0sh
Copy link

bart0sh commented Dec 29, 2018

I did change it in both places, but this was not enough. Changes that you've proposed would trigger errors in generating secrets, as secret names should not contain slashes. Changing name in constants from etc/ca to etc-ca would make api server stuck on start.

@fabriziopandini fabriziopandini pinned this issue Dec 29, 2018
@fabriziopandini
Copy link
Member Author

@bart0sh if you can share the generated yaml for the kube-apiserver deamonset might be I can help...

bart0sh added a commit to bart0sh/kubernetes that referenced this issue Jan 2, 2019
Selfhosting pivoting fails when using --store-certs-in-secrets
as api-server fails to start because of missing etcd/ca and
apiserver-etcd-client certificates:
   F1227 16:01:52.237352 1 storage_decorator.go:57] Unable to create storage backend:
   config (&{ /registry [https://127.0.0.1:2379]
              /etc/kubernetes/pki/apiserver-etcd-client.key
              /etc/kubernetes/pki/apiserver-etcd-client.crt
              /etc/kubernetes/pki/etcd/ca.crt true 0xc000884120 <nil> 5m0s 1m0s}),
   err (open /etc/kubernetes/pki/apiserver-etcd-client.crt: no such file or directory)

Added required certificates to fix this.

Secret name for etc/ca certifcate has been converted to conform RFC-1123 subdomain
naming conventions to prevent this TLS secret creation failure:
    unable to create secret: Secret "etcd/ca" is invalid: metadata.name:
    Invalid value: "etcd/ca": a DNS-1123 subdomain must consist of lower
    case alphanumeric characters, '-' or '.', and must start and end with an
    alphanumeric character (e.g. 'example.com', regex used for validation is
    '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

Related issue: kubernetes/kubeadm#1281
@bart0sh
Copy link

bart0sh commented Jan 2, 2019

/assign

@yagonobre
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 2, 2019
@neolit123 neolit123 unpinned this issue Jan 3, 2019
@bart0sh
Copy link

bart0sh commented Jan 3, 2019

@fabriziopandini I've got api server running with the above fix. Thank you for the help!

next thing: kube-controller-manager is crashlooping, but kubeadm WaitForPodsWithLabel API manages to catch short moment when it's in running state (not sure how that could be though):

[self-hosted] >>> wait for kube-controller-manager to come up, label k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 0 Pods for label selector k8s-app=self-hosted-kube-controller-manager
[apiclient] Found 1 Pods for label selector k8s-app=self-hosted-kube-controller-manager
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Pending
>>> found pod.ObjectMeta.Name self-hosted-kube-controller-manager-p9tpb, status: Running
[apiclient] The old Pod "kube-controller-manager-ed" is now removed (which is desired)
[apiclient] All control plane components are healthy after 0.000702 seconds
$ kubectl get pods -n kube-system |grep controller
self-hosted-kube-controller-manager-p9tpb   0/1     CrashLoopBackOff   6          8m11s

The reason for CrashLoop doesn't matter yet. The issue is that at some point pod status is 'Running'. Do you have any idea why and how to fix this?

@bart0sh
Copy link

bart0sh commented Jan 3, 2019

/reopen

@k8s-ci-robot
Copy link
Contributor

@bart0sh: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bart0sh
Copy link

bart0sh commented Jan 3, 2019

@neolit123 @kad @rosti do you have any idea why this could happen?

bart0sh added a commit to bart0sh/kubernetes that referenced this issue Jan 7, 2019
Modified command line options --authentication-kubeconfig and
--authorization-kubeconfig to point out to the correct location
of the controller-manager.conf

This should fix this controller-manager crash:
    failed to get delegated authentication kubeconfig: failed to get
    delegated authentication kubeconfig: stat
    /etc/kubernetes/controller-manager.conf: no such file or directory

Related issue: kubernetes/kubeadm#1281
@fabriziopandini
Copy link
Member Author

@bart0sh if I'm not wrong this is fixed now... can you confirm?

@bart0sh
Copy link

bart0sh commented Jan 8, 2019

@fabriziopandini not yet. I'm working on it.

btw. Can you answer above question, please?

@fabriziopandini
Copy link
Member Author

@bart0sh
I'm not sure that we should implement a logic that detects if a self-hosting pod runs and then continue to run... this could potentially lead to a never ending story.

Instead, I think we should investigate why kube-controller-manager is crashlooping, and make sure this condition is not generated by the self-hosting pivoting logic.

Once we are sure the pivoting logic doesn't introduce "regressions" that we can eventually discuss if/how to make the whole process more robust (e.g. by implementig preflight checks or rollback logic).

bart0sh added a commit to bart0sh/kubernetes that referenced this issue Jan 9, 2019
…ager

Selfhosting pivoting fails when using --store-certs-in-secrets
as controller-manager fails to start because of missing front-proxy CA
certificate:
    unable to load client CA file: unable to load client CA file: open
    /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory

Added required certificate to fix this.

This should fix kubernetes/kubeadm#1281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
6 participants