Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

External etcd with TLS not working #2841

Open
lblackstone opened this issue Jan 26, 2018 · 4 comments
Open

External etcd with TLS not working #2841

lblackstone opened this issue Jan 26, 2018 · 4 comments
Labels

Comments

@lblackstone
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

  • Tectonic version (release or commit hash): fd42c7f (track-01 branch)
  • Terraform version (terraform version): Terraform v0.11.2
  • Platform (aws|azure|openstack|metal): openstack

What happened?

Tried to create a cluster using external (not self-hosted) etcd. The bootkube service fails to bootstrap the k8s control plane and eventually times out.

What you expected to happen?

Successfully boot a k8s cluster with external etcd.

How to reproduce it (as minimally and precisely as possible)?

Make sure the following etcd options are set:

tectonic_etcd_count = "3"
tectonic_self_hosted_etcd = ""

Log into the master-0 node during bootstrapping and watch the bootkube service logs with journalctl -fu bootkube.

Anything else we need to know?

As far as I can tell, the kube-apiserver is not able to communicate with the etcd cluster because the TLS certs are not being installed in the expected location on the master nodes. The kube-apiserver manifest indicates that secrets should be found at /etc/kubernetes/bootstrap-secrets and /etc/kubernetes/secrets.

From https://github.com/coreos/tectonic-installer/blob/master/Documentation/dev/tls.md it looks like modules/tls/etcd should be responsible for the relevant certs.

@lblackstone
Copy link
Contributor Author

Getting the same result on the master branch.

@carlosaya
Copy link

same behavior for me on vmware

@lblackstone
Copy link
Contributor Author

lblackstone commented Feb 13, 2018

After some further digging, it seems that external etcd with TLS support is broken in a couple ways:

  1. TLS assets are not installed in the expected location on the master nodes
  2. Generated etcd cert is configured for self-hosted IP addresses. See details below. See edit:

After logging onto the master-0 node, I attempted to use etcdctl to access the etcd cluster directly. Notice that the certificate is configured to only allow 127.0.0.1, 10.3.0.15, and 10.3.0.20.

$ etcdctl --ca-file=/opt/tectonic/tls/etcd-client-ca.crt --cert-file=/opt/tectonic/tls/etcd-client.crt --key-file=/opt/tectonic/tls/etcd-client.key --endpoints=https://192.168.1.8:2379,https://192.168.1.6:2379,https://192.168.1.11:2379 cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.1.6:2379: getsockopt: connection refused
; error #1: x509: certificate is valid for 127.0.0.1, 10.3.0.15, 10.3.0.20, not 192.168.1.11
; error #2: x509: certificate is valid for 127.0.0.1, 10.3.0.15, 10.3.0.20, not 192.168.1.8

error #0: dial tcp 192.168.1.6:2379: getsockopt: connection refused
error #1: x509: certificate is valid for 127.0.0.1, 10.3.0.15, 10.3.0.20, not 192.168.1.11
error #2: x509: certificate is valid for 127.0.0.1, 10.3.0.15, 10.3.0.20, not 192.168.1.8

Edit: Item 2 turned out to be a non-issue, because etcd is configured to use the DNS records rather than the bare IP addresses of the etcd members.

$ etcdctl --ca-file=/opt/tectonic/tls/etcd-client-ca.crt --cert-file=/opt/tectonic/tls/etcd-client.crt --key-file=/opt/tectonic/tls/etcd-client.key --endpoints=https://mycluster-etcd-0.<redacted>:2379,https://mycluster-etcd-1.<redacted>:2379,https://mycluster-etcd-2.<redacted>:2379 cluster-health
member 3907a31e6e0ffe1e is healthy: got healthy result from https://mycluster-etcd-0.<redacted>:2379
member 594b64050f39f3b3 is healthy: got healthy result from https://mycluster-etcd-1.<redacted>:2379
member a357b4b758ef90e8 is healthy: got healthy result from https://mycluster-etcd-2.<redacted>:2379
cluster is healthy

@lblackstone lblackstone changed the title [openstack] External etcd not working External etcd with TLS not working Feb 13, 2018
@lblackstone
Copy link
Contributor Author

I was able to get external etcd working with TLS disabled, confirming that the problems are with TLS provisioning. The bugs exist on both the master and track-01 branches, and look like they would affect multiple (all?) platforms.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants