Support deploying self-hosted etcd #31

philips · 2016-05-02T03:31:05Z

From the README:

When you start bootkube, you must also give it the addresses of your etcd servers, and enough information for bootkube to create an ssh tunnel to the node that will become a member of the master control plane. Upon startup, bootkube will create a reverse proxy using an ssh connection, which will allow a bootstrap kubelet to contact the apiserver running as part of bootkube.

In the original prototype we had a built in etcd. Why is that no longer part of this?

aaronlevy · 2016-05-02T20:20:55Z

We need the data that ends up in etcd to persist with the cluster that is launched. If that data lives in the bootkube, then bootkube must continue to run for the lifecycle of the cluster.

Alternatively, we need a way to pivot the etcd data injected during the bootstrap process to the "long-lived" etcd cluster. The long-lived cluster would essentially be a self-hosted etcd cluster launched as k8s components just like the rest of the control plane.

What I'd probably like to see is something along the lines of:

Bootkube runs etcd in-process
k8s objects injected to the api-server end up in the local/in-process etcd
One of those objects is an etcd pod definition, which is started as a self-hosted pod on a node.
The self hosted etcd "joins" the existing bootkube etcd, making a cluster of 2 nodes.
etcd replication copies all state to new joined etcd node
bootkube dies after a self-hosted control-plane is started, removing itself from etcd cluster membership
self-hosted etcd cluster is managed from that point forward as a k8s component.

Another option might be trying to copy the etcd keys from the in-process/local node to the self-hosted node, but this can get a little messy because we would be trying to manually copy (and mirror) data of a live cluster.

Some concerns with this approach:

Managing etcd membership in K8s is not currently a very good story. It's either waiting on petsets or trying to handle this with lifecycle hooks, or relying on external mechanics for membership management.
Pretty unproven and a bit risky from a production perspective to try and run etcd for the cluster, also "in" the cluster. But I can see the value in this from a "get started easily" while we exercise this as a viable option.

aaronlevy · 2016-05-02T20:21:54Z

@philips what do you think about changing this issue to be "support self-host etcd", and dropping from 0.1.0 milestone ?

aaronlevy · 2016-05-03T17:58:15Z

Adding notes from a side-discussion:

Another option that was mentioned is just copying keys from bootkube-etcd to cluster-etcd. This would require some coordination points in the bootkube process:

bootkube-apiserver configured to use bootkube-etcd
bootkube only injects objects for self-hosted etcd pods and waits for them to be started
bootkube stops internal api-server (no more changes to local state)
Copy all etcd keys form local to remote (self-hosted) cluster
start bootkube-apiserver again but have it point to the self-hosted etcd
create the rest of the self-hosted objects & finish bootkube run as normal

stuart-warren · 2016-08-16T19:03:47Z

How do you want the self-hosted apiserver to discover the location of self-hosted etcd?
I tried using an external loadbalancer listening on 2379 with a known address, but the apiserver throws a bunch of:

reflector.go:334] pkg/storage/cacher.go:163: watch of *api.LimitRange ended with: client: etcd cluster is unavailable or misconfigured

v1.3.5 talking to etcd v3.0.3

edit:
These issues were just harmless error messages in the log file from having a 10sec client timeout in the haproxy config constantly breaking watches.

kalbasit · 2016-08-19T02:52:32Z

I've managed to get this done. Using a separate ETCD cluster where each k8s node (master/minion) is running an ETCD in proxy mode. I'm using Terraform to configure both. The etcd module is available here and the k8s module is available here.

P.S: The master is not volatile and cannot be scaled. If the master node reboots it will not start any of the components again, not sure why but bootkube thinks they are running and quits. Possibly due to having /registry in etcd.

P.P.S: I had few issues doing that but mostly related to me adding --cloud-provider=aws to the kubelet, the controller and the api-server. Issues related bootkube started in a container without /etc/resolv.conf and /etc/ssl/certs/ca-certificates.crt. I'll file separate issues/PR for those.

philips · 2016-09-22T05:24:12Z

@xiang90 and @hongchaodeng can you put some thoughts together on this in relation to having an etcd controller.

I think there are essentially two paths:

Copy the data from the bootkube etcd to the cluster etcd
Add the bootkube etcd to the cluster etcd, then remove the bootkube etcd once everything is replicated

I think option 2 is better because it means we don't have to worry about cutting over and having split brain. But! How do we do 2 if the cluster only intend to have one etcd member (say in AWS because you will have a single machine cluster backed by EBS).

I think we should try and prototype this out ASAP as this is the last remaining component that hasn't been proven to be self-hostable.

xiang90 · 2016-09-22T10:20:30Z

@philips I have thought about this a little bit. And here is the workflow in my mind:

create a one member cluster in bootkube

... k8s is ready...

start etcd controller
etcd controller adds a member into the one member cluster created by bootkube
wait for the new member to sync with the seed one
remove the bootkube etcd member

Now etcd controller fully control the etcd cluster and can grow the size to desired size.

ethernetdan · 2016-09-24T00:43:16Z

Started some work on this - got a bootkube-hosted etcd cluster up, now working on migrating from the bootkube instance to the etcd-controller managed instance

pires · 2016-10-13T10:37:49Z

Add the bootkube etcd to the cluster etcd, then remove the bootkube etcd once everything is replicated

@philips what happens if the self-hosted etcd cluster (or the control plane behind it) dies? I believe this is why @aaronlevy mentioned it is:

(...) a bit risky from a production perspective to try and run etcd for the cluster, also "in" the cluster. But I can see the value in this from a "get started easily" while we exercise this as a viable option.

This is exactly the concern I shared in the design proposal.

Can this issue clarify if this concept is simply meant for non-production use-cases?

philips · 2016-10-13T19:31:03Z

@pires if the self-hosted etcd cluster dies you need to recover using bootkube from a backup. This is really no different than if it died normally and you would have to redeploy the cluster from a backup and restart the API servers again.

pires · 2016-10-13T21:27:14Z

@philips can you point me to the backup strategy you guys are designing or already implementing?

xiang90 · 2016-10-13T21:29:05Z

@pires

I believe the backup @philips mentioned is actually the etcd backup. For the etcd-controller, we do a backup:

every X minutes. X is defined by the user
once we upgrade the cluster
user can hit backup/now endpoint to force a backup when there is an expected important event like upgrading k8s master components.

pires · 2016-10-13T21:36:37Z

I understand the concept and it should work as you say, I'm just looking for more details on:

Where is each etcd member data stored?
Where is the backup data stored?
How is bootkube leveraging the stored data?

Don't take me wrong, I find this really cool and I'm trying to grasp it as much as possible as sig-cluster-lifecycle looks into HA.

xiang90 · 2016-10-13T21:41:12Z

Where is each etcd member data stored?

The data is stored on local storage. etcd has builtin recovery mechinism. When you have a 3 member etcd cluster, you already have 3 local copies

Where is the backup data stored

Backup is a for extra safety. It helps with rollback + disaster recovery.
It stores on PV, like EBS, Ceph, GlusterFS, etc..

How is bootkube leveraging the stored data

If there is a disaster case or bad upgrade, we recover the cluster from the backup.

philips · 2016-11-22T20:03:24Z

As an update on the etcd and self-hosted plan we have merged support behind an experimental flag in bootkube: https://github.com/kubernetes-incubator/bootkube/blob/master/cmd/bootkube/start.go#L37

This is self-hosted and self-healing etcd on top of Kubernetes.

orbatschow · 2016-12-05T23:33:38Z

@philips
What about using the new etcd operator, to run etcd fully managed on top of kubernetes, i think this will simplify maintenance, updates ... alot.

xiang90 · 2016-12-06T17:31:53Z

@gitoverflow That is the plan.

aaronlevy · 2017-02-28T22:34:29Z

I am going to close this as initial self-hosted etcd support has been merged. There are follow up issues open for specific tasks:

(documentation): #240
(adding support to all hack/* examples): #337
(iptables checkpointing): #284

jamiehannaford · 2017-03-10T10:47:12Z

Although bootkube now supports a self-hosted etcd pod for bootstrapping, I can't find any documentation which explains:

How a follow-up etcd controller syncs with the bootkube etcd pod
Whether it's possible for an etcd-operator to manage the lifecycle of the cluster etcd controller itself (as opposed to a user-defined etcd cluster)

aaronlevy · 2017-03-10T17:51:04Z

@jamiehannaford You're right - and we do need to catch up on Documentation. Some tracking issues:

#240
#311
#302

Regarding your questions:

We need to add a "how it works" section to this repo - but the closest so far might be the youtube link in my comment here: Add "How it works" section to README #302 - it briefly goes into how the seed etcd pod is pivoted into the self-hosted etcd cluster (by the etcd-operator).
Yes, the plan is for the etcd-operator to manage the cluster-etcd - so things like re-sizing, backups, updates, etc. (of your cluster etcd) could be managed by the etcd-operator.

jamiehannaford · 2017-03-13T14:13:19Z

@aaronlevy Thanks for the links. I'm still wrapping my head around the boot-up procedure. It seems the chronology for a self-hosted etcd cluster is:

A static pod for etcd is created
The temp control panel is created
The self-hosted control plane components are created against the temp one
When all the components in 3 are ready, the etcd-operator creates the new self-hosted etcd cluster, migrating all the data from 1

My question is, why does the self-hosted etcd need to wait for certain pods to exist before the data migration happens? I thought the data migration would happen first, then all the final control plane elements would be created.

I looked at the init args for kube-apiserver, and it has the eventual IPv4 of the real etcd (10.3.0.15). This means there's a gap of time between the api-server being created and the real etcd existing. Doesn't this create some kind of crash loop since the API server has nothing to connect to? Or is this gap negligible?

aaronlevy · 2017-03-13T16:23:11Z

@jamiehannaford

why does the self-hosted etcd need to wait for certain pods to exist before the data migration happens?

It could likely work in this order as well - but there could be more coordination points (vs just "everything is running - so let the etcd-operator take over"). For example, we would need to make sure to deploy kube-proxy & etcd-operator, then do the etcd pivot, then create the rest of the cluster components. Where right now it's just "create all components that exist in the /manifest dir, wait for some of them, do etcd-pivot" - which initially is easier.

Are there any issues particular to the current order that you've found?

Doesn't this create some kind of crash loop since the API server has nothing to connect to?

Sort of. Really everything pivots around etcd / apiserver addressability. The "real" api-server doesn't immediately take over, because it is unable to bind on 8080/443 (bootkube apiserver is still listening on those ports). The rest of the components don't know if they're talking to bootkube-apiserver or "real" apiserver. It's just an address they expect to reach. So when we're ready to pivot to the self-hosted control-plane, it's just simply exiting the bootkube-apiserver so the ports free up.

You're right that there will be a moment where no api-server is bound to the ports - but it's actually fine in most cases for components to fail/retry - much of Kubernetes is designed this way (including its core components).

However, there currently is an issue where the bootkube-apiserver is still "active", but it expects to only be talking to the static/boot etcd node - however - that node may have already been removed as a member if the etcd cluster. This puts us in a state where the "active" bootkube apiserver can no longer reach the data-store and essentially becomes inactive.

See #372 for more info.

The above issue might be as simple as adding both boot-etcd address and the service IP for self-hosted etcd to the bootkube api-server, I just haven't had a chance to test that assumption.

philips added kind/documentation Categorizes issue or PR as related to documentation. kind/question priority/P1 labels May 2, 2016

philips added this to the v0.1.0 milestone May 2, 2016

aaronlevy changed the title ~~No built in etcd?~~ Support deploying self-hosted etcd May 3, 2016

aaronlevy added kind/feature Categorizes issue or PR as related to a new feature. priority/P2 and removed kind/documentation Categorizes issue or PR as related to documentation. kind/question priority/P1 labels May 3, 2016

aaronlevy removed this from the v0.1.0 milestone May 3, 2016

philips mentioned this issue Jun 29, 2016

Make etcdmain/serve.go public and more modular etcd-io/etcd#5430

Closed

philips mentioned this issue Sep 22, 2016

prototype with bootkube coreos/etcd-operator#128

Closed

pires mentioned this issue Nov 4, 2016

integrating with kubeadm coreos/etcd-operator#335

Closed

philips mentioned this issue Nov 22, 2016

HA Kubernetes etcd configuration kubernetes/kubernetes#19443

Closed

aaronlevy mentioned this issue Dec 21, 2016

WIP: Kubeadm self-hosted deployment type kubernetes/kubernetes#38407

Closed

aaronlevy mentioned this issue Dec 30, 2016

Allow talking to secure etcd (authentication through client certs) #245

Closed

aaronlevy added priority/P1 and removed priority/P2 labels Jan 18, 2017

aaronlevy assigned xiang90 Feb 14, 2017

aaronlevy closed this as completed Feb 28, 2017

irfanurrehman mentioned this issue Oct 30, 2017

HA Kubernetes etcd configuration kubernetes-retired/federation#38

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support deploying self-hosted etcd #31

Support deploying self-hosted etcd #31

philips commented May 2, 2016

aaronlevy commented May 2, 2016

aaronlevy commented May 2, 2016

aaronlevy commented May 3, 2016

stuart-warren commented Aug 16, 2016 •

edited

Loading

kalbasit commented Aug 19, 2016 •

edited

Loading

philips commented Sep 22, 2016

xiang90 commented Sep 22, 2016 •

edited

Loading

ethernetdan commented Sep 24, 2016

pires commented Oct 13, 2016

philips commented Oct 13, 2016

pires commented Oct 13, 2016

xiang90 commented Oct 13, 2016 •

edited

Loading

pires commented Oct 13, 2016

xiang90 commented Oct 13, 2016

philips commented Nov 22, 2016

orbatschow commented Dec 5, 2016 •

edited

Loading

xiang90 commented Dec 6, 2016

aaronlevy commented Feb 28, 2017

jamiehannaford commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

jamiehannaford commented Mar 13, 2017

aaronlevy commented Mar 13, 2017

Support deploying self-hosted etcd #31

Support deploying self-hosted etcd #31

Comments

philips commented May 2, 2016

aaronlevy commented May 2, 2016

aaronlevy commented May 2, 2016

aaronlevy commented May 3, 2016

stuart-warren commented Aug 16, 2016 • edited Loading

kalbasit commented Aug 19, 2016 • edited Loading

philips commented Sep 22, 2016

xiang90 commented Sep 22, 2016 • edited Loading

ethernetdan commented Sep 24, 2016

pires commented Oct 13, 2016

philips commented Oct 13, 2016

pires commented Oct 13, 2016

xiang90 commented Oct 13, 2016 • edited Loading

pires commented Oct 13, 2016

xiang90 commented Oct 13, 2016

philips commented Nov 22, 2016

orbatschow commented Dec 5, 2016 • edited Loading

xiang90 commented Dec 6, 2016

aaronlevy commented Feb 28, 2017

jamiehannaford commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

jamiehannaford commented Mar 13, 2017

aaronlevy commented Mar 13, 2017

stuart-warren commented Aug 16, 2016 •

edited

Loading

kalbasit commented Aug 19, 2016 •

edited

Loading

xiang90 commented Sep 22, 2016 •

edited

Loading

xiang90 commented Oct 13, 2016 •

edited

Loading

orbatschow commented Dec 5, 2016 •

edited

Loading