Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

aaronlevy · 2017-03-10T22:28:08Z

This is most easily surfaced when using self-hosted etcd - because there is a period of time when pivoting from boot-etcd to self-hosted etcd that the compiled-in control plane is no longer able to contact the etcd cluster (and therefore give up leader status).

When either the scheduler or controller-manager give up leadership it calls an os.Exit() which kills bootkube entirely - and the bootstrap process may not be complete (e.g. boot-etcd still running, or in another incomplete state).

One fix might to to just provide the bootkube api-server with the expected addresses of both etcd's (boot-etcd, and the self-hosted etcd service ip) when running bootkube start. For example: https://github.com/kubernetes-incubator/bootkube/blob/master/hack/multi-node/bootkube-up#L16

Another solution in the future might be to use static manifests for control plane components, such that if they exit they won't affect bootkube.

/cc @hongchaodeng @xiang90

The text was updated successfully, but these errors were encountered:

peebs · 2017-03-10T22:57:44Z

I see this occasionally in the CI. For example: https://jenkins-tectonic.prod.coreos.systems/job/bootkube-dev/454/console

aaronlevy · 2017-03-10T23:08:43Z

We may need to turn off self-hosted etcd tests running automatically (but allow them to be manually triggered) - seems to be failing PRs that shouldn't

aaronlevy · 2017-03-10T23:08:57Z

Or we deal with the known flakes

peebs · 2017-03-10T23:31:09Z

The main flakes I observe these days seem to be the leaderelection one and the etcd scaling test not completing its scale down phase. I bumped the time we wait for the pods to scale to 200 seconds and I see flakes sometimes so I suspect its getting wedged.

Shall I move the etcd tests all back into the optional job? Upside is green checkmark is easier to get. Downside is we will probably run the optional tests less often and have to remember to run them. Either way it means more robot interactions for us.

aaronlevy · 2017-03-10T23:42:37Z

This one might not be a very complex fix (assuming it just means an extra flag provided to bootkube start maybe just leave them for now - and if this isn't an easy resolution we can revisit

Quentin-M · 2017-03-27T23:09:03Z

The leaderelection lock is 15 seconds by default for the CM and Sched. However, as soon as the init container of the new etcd node, started by the etcd operator, adds a member to the cluster, the cluster loses quorum. Then if by any change the CM or Sched tries to renew their lock, which can happen anytime between 0.01s and 15 seconds, before the etcd instance is up (which is different from the init container), then the leader election fails and the CM/Sched dies, killing bootkube with it.

Quentin-M · 2017-03-28T00:22:47Z

It seems however that bootkube died before the migration even started. It appears that boot-etcd experienced request/sync timeouts, maybe related to slow disk on AWS (EBS), which might have failed the lease renewal and killed CM/Sched with glog.Fatalf() -> os.Exit(255). Additionally, the TPR was shown to be ready but getBootEtcdPodIP() never succeeded. This function calls a List() operation which calls etcd. This is another hint that etcd might have been too slow. This would mean that this issue could actually happen regardless of whether you use self-hosted etcd or not - if your etcd cluster is quite slow. This issue is exacerbated by the fact that we run the etcd instance on the same node, download containers (disk bandwidth, ..) etc. Seems like running the CM/Sched as a static pod so they can restart would be the easiest solution - if not to trap the syscall :D

Quentin-M · 2017-03-28T00:29:11Z

Using SSD-backed huge machines
Disabling the GC on the CM
allowed it to work at least once. It does seem to work around the lease renewal failure due to etcd being too slow.

However, we might still hit a renewal failure when the etcd quorum is broken (added member - not started/sync'd), especially given the very low lease renewal interval.

aaronlevy · 2017-04-04T17:47:51Z

On the bootkube side there are two things we should do to mitigate this issue:

Allow bootkube start to reference multiple etcd-endpoints (opened issue to specifically track that fix here: #411)

And move to using static manifests for temp control-plane: #168

aaronlevy · 2017-04-14T22:07:29Z

This should be resolved by #425 (static manifests in control-plane)

And #418 (temp control plane should reference both temp & self hosted etcd)

aaronlevy added kind/bug Categorizes issue or PR as related to a bug. priority/P0 labels Mar 10, 2017

aaronlevy mentioned this issue Mar 13, 2017

Support deploying self-hosted etcd #31

Closed

aaronlevy mentioned this issue Apr 4, 2017

bootkube start should allow specifying multiple etcd-servers #411

Closed

diegs mentioned this issue Apr 6, 2017

Fix apiserver etcd-servers flag when self-hosting. #418

Merged

aaronlevy closed this as completed Apr 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

aaronlevy commented Mar 10, 2017 •

edited

Loading

peebs commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

peebs commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

Quentin-M commented Mar 27, 2017

Quentin-M commented Mar 28, 2017

Quentin-M commented Mar 28, 2017

aaronlevy commented Apr 4, 2017

aaronlevy commented Apr 14, 2017

Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

Bootkube prematurely exits if scheduler/controller-manager lose leader-election #372

Comments

aaronlevy commented Mar 10, 2017 • edited Loading

peebs commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

peebs commented Mar 10, 2017

aaronlevy commented Mar 10, 2017

Quentin-M commented Mar 27, 2017

Quentin-M commented Mar 28, 2017

Quentin-M commented Mar 28, 2017

aaronlevy commented Apr 4, 2017

aaronlevy commented Apr 14, 2017

aaronlevy commented Mar 10, 2017 •

edited

Loading