Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd fails kubelet's health checks #720

Closed
Q-Lee opened this issue Mar 3, 2018 · 2 comments · Fixed by kubernetes/kubernetes#60728
Closed

etcd fails kubelet's health checks #720

Q-Lee opened this issue Mar 3, 2018 · 2 comments · Fixed by kubernetes/kubernetes#60728

Comments

@Q-Lee
Copy link

Q-Lee commented Mar 3, 2018

What keywords did you search in kubeadm issues before filing this one?

etcd ssl health

Is this a BUG REPORT or FEATURE REQUEST?

Choose one: BUG REPORT

Versions

kubeadm version (use kubeadm version): HEAD

Environment: custom/docker-in-docker, docker-in-docker-in-docker

  • Kubernetes version (use kubectl version): HEAD
  • Cloud provider or hardware configuration: local
  • OS (e.g. from /etc/os-release): Debian
  • Kernel (e.g. uname -a): 4.9.0-5

What happened?

tl;dr - kubelet doesn't know the key/crt for etcd, so that etcd returns "non-sense" for its health check.

  1. kubeadm creates etcd file
  2. kubelet creates etcd container
  3. cluster comes up, etcd is taking writes, and everything's great
  4. etcd fails kubelet's health checks
  5. nothing's great

$ curl -k https://127.0.0.1:2379/health --key apiserver-etcd-client.key --cert apiserver-etcd-client.crt
{"health": "true"}
$ curl -k https://127.0.0.1:2379/health
curl: (35) error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
$ journalctl | grep "Liveness probe" | grep -v succeeded | tail -n1
Mar 03 09:49:08 7e928d1f665e kubelet[151]: I0303 09:49:08.142334 151 server.go:422] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"etcd-7e928d1f665e", UID:"17c2da92666b4a9242f31873234f3101", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{etcd}"}): type: 'Warning' reason: 'Unhealthy' Liveness probe failed: Get http://127.0.0.1:2379/health: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

What you expected to happen?

I expect etcd to pass health checks.

How to reproduce it (as minimally and precisely as possible)?

I suspect any kubeadm cluster built from HEAD will have this quirk.

Anything else we need to know?

@Q-Lee
Copy link
Author

Q-Lee commented Mar 3, 2018

Looking at the kubeadm code, it appears we statically set the probe scheme to http, but etcd is https. Even if we didn't set client auth to true, I don't know how this could ever have worked. Did etcd's behavior change under us?

https://github.com/kubernetes/kubernetes/blame/master/cmd/kubeadm/app/phases/etcd/local.go#L68

@stealthybox
Copy link
Member

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Mar 5, 2018
…tcd_tls

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add mTLS to kubeadm etcd liveness probe.

**What this PR does / why we need it**:
We switched etcd over to using mTLS, but the liveness probe is still using http.
Disabling the liveness probe allows etcd to continue operating.

The real fix isn't simple, because we need to generate a client certificate for healthchecking and update the probe to exec `etcdctl` like so: 
https://sourcegraph.com/github.com/coreos/etcd-operator/-/blob/pkg/util/k8sutil/pod_util.go#L71-89

~Working on patching this now.~
This PR now generates the healthcheck identity and updates the liveness probe to use it.

**Which issue(s) this PR fixes**
Fixes #59766
Fixes kubernetes/kubeadm#720

**Special notes for your reviewer**:
We should generate a client cert specifically for etcd health checks so that the apiserver certs can be revoked independently.
This will be stored in `/etc/kubernetes/pki/etcd/` so that we don't have to change the pod's hostMount.

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants