master: flanneld seems to be timing out while running decrypt-tls-asset in ExecStartPre #65

mumoshu · 2016-11-17T09:04:11Z

With 7ea5f6b, I've seen an error message like:

Nov 17 08:48:10 ip-10-0-0-216.ap-northeast-1.compute.internal systemd[1]: flanneld.service: Start-pre operation timed out. Terminating.
Nov 17 08:48:11 ip-10-0-0-216.ap-northeast-1.compute.internal etcdctl[8586]: open /etc/kubernetes/ssl/etcd-client.pem: no such file or directory
Nov 17 08:48:11 ip-10-0-0-216.ap-northeast-1.compute.internal systemd[1]: flanneld.service: Control process exited, code=exited status=1

Full log can be seen at https://gist.github.com/mumoshu/6f9fe119f882d3fcda40322d209123d8

It seems that after decrypt-tls-assets timing out, systemd continues to run next ExecStartPre, which also end up with an error like etcd-client.pem: no such file or directory(it might be so because you systemd terminated decrypt-tls-assets which is intended to generate that file!)

It seems to take about 3 min 30 sec until flanneld fully gets up and running.
Could we shorten it by removing unnecessary timeouts like this?

The text was updated successfully, but these errors were encountered:

mumoshu · 2016-11-17T09:06:48Z

I guess timing out/terminating decrypt-tls-assets like this would needlessly make flanneld startup time longer.

The timeout seems to be 10 second according to timestamps.
Should we make it sufficiently longer, maybe 60 sec?

mumoshu · 2016-11-17T23:32:38Z

According to the systemd doc, there seems no specific configuration just for ExecStartPre timeouts.
Possibly relevant configurations are TimeoutSec and TimeoutStartSec.
I'm going to try the latter.

mumoshu · 2016-11-18T00:25:29Z

Currently testing with TimeoutStartSec=60:

core@ip-10-0-0-40 ~ $ systemctl show flanneld.service | grep Timeout
TimeoutStartUSec=1min
TimeoutStopUSec=1min 30s
JobTimeoutUSec=infinity
JobTimeoutAction=none
core@ip-10-0-0-40 ~ $ systemctl show kubelet.service | grep Timeout
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
JobTimeoutUSec=infinity
JobTimeoutAction=none
core@ip-10-0-0-40 ~ $ systemctl show docker.service | grep Timeout
TimeoutStartUSec=infinity
TimeoutStopUSec=1min 30s
JobTimeoutUSec=infinity
JobTimeoutAction=none

Now it takes 2min until flanneld fully starts up:
https://gist.github.com/mumoshu/71e7c1858ef439197360121e4aaac1d9

However, 60 sec doesn't seem to be sufficient:

Nov 18 00:02:38 ip-10-0-0-40.ap-northeast-1.compute.internal systemd[1]: Starting Network fabric for containers...
...
Nov 18 00:03:38 ip-10-0-0-40.ap-northeast-1.compute.internal systemd[1]: flanneld.service: Start-pre operation timed out. Terminating.

mumoshu · 2016-11-18T00:38:01Z

With TimeoutStartSec=120:

Nov 18 00:31:08 ip-10-0-0-133.ap-northeast-1.compute.internal systemd[1]: Starting Network fabric for containers...
Nov 18 00:32:40 ip-10-0-0-133.ap-northeast-1.compute.internal systemd[1]: Started Network fabric for containers.

https://gist.github.com/mumoshu/763efc6c923c966323c6d9757425f738

There're no timeouts and it takes only 1min 32 sec until up 🎉

mumoshu · 2016-11-18T00:41:37Z

This slowness is almost certainly slipped into v0.9.1-rc.1 via #34, and have been active since then.

fix kubernetes-retired#65

mumoshu modified the milestones: v0.9.1-rc.3, v0.9.1-rc.4 Nov 17, 2016

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Nov 18, 2016

fix: flanneld startup slowness

8008b43

fix kubernetes-retired#65

mumoshu mentioned this issue Nov 18, 2016

fix: flanneld startup slowness #73

Merged

mumoshu closed this as completed in #73 Nov 18, 2016

mumoshu added the kind/bug Categorizes issue or PR as related to a bug. label Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master: flanneld seems to be timing out while running decrypt-tls-asset in ExecStartPre #65

master: flanneld seems to be timing out while running decrypt-tls-asset in ExecStartPre #65

mumoshu commented Nov 17, 2016 •

edited

Loading

mumoshu commented Nov 17, 2016

mumoshu commented Nov 17, 2016

mumoshu commented Nov 18, 2016

mumoshu commented Nov 18, 2016 •

edited

Loading

mumoshu commented Nov 18, 2016

master: flanneld seems to be timing out while running decrypt-tls-asset in ExecStartPre #65

master: flanneld seems to be timing out while running decrypt-tls-asset in ExecStartPre #65

Comments

mumoshu commented Nov 17, 2016 • edited Loading

mumoshu commented Nov 17, 2016

mumoshu commented Nov 17, 2016

mumoshu commented Nov 18, 2016

mumoshu commented Nov 18, 2016 • edited Loading

mumoshu commented Nov 18, 2016

mumoshu commented Nov 17, 2016 •

edited

Loading

mumoshu commented Nov 18, 2016 •

edited

Loading