Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

kubernetes: Morph the livenessProbe into readinessProbe #3421

Merged
merged 1 commit into from
Oct 16, 2018

Conversation

dlespiau
Copy link
Contributor

@dlespiau dlespiau commented Sep 28, 2018

The liveness probe has the not-so-great side effect of killing the weave pods
if anything goes wrong, making debugging really difficult.

Not only that, we also bumped into the livenessProbe killing pods before they
had a chance to start (see below for the full story).

The readinessProbe is only used for Kubernetes Services: no traffic will be
routed to non-Ready pods when using the Service IP. We can use to to signal the
user something in wrong with Weave Net while not having Kubernetes delete the
pods behind our backs.

I removed the ininitialDelaySeconds field as Kurbernetes won't kill the pod if
not ready.

Notes from a debugging session:

Some nodes may have a vast amount of iptables rules, making each iptables
command take a few seconds (~5-10s). At startup, when the weave container
creates its chains, it runs quite a few iptables commands and the total time of
execution was > 30s.

It's important to note that, at this point, the weave process hasn't started as
those iptables commands are run by an initial script.

Kubernetes was trying to check for the container liveness, was failing, and
then proceeded to kill the weave pods before they had a chance to fully start.

Fixes: #3417

@dlespiau dlespiau requested a review from bboreham September 28, 2018 14:19
@bboreham
Copy link
Contributor

See #3417 wherein we argue that the liveness probe should be removed

The liveness probe has the not-so-great side effect of killing the weave pods
if anything goes wrong, making debugging really difficult.

Not only that, we also bumped into the livenessProbe killing pods before they
had a chance to start (see below for the full story).

The readinessProbe is only used for Kubernetes Services: no traffic will be
routed to non-Ready pods when using the Service IP. We can use to to signal the
user something in wrong with Weave Net while not having Kubernetes delete the
pods behind our backs.

I removed the ininitialDelaySeconds field as Kurbernetes won't kill the pod if
not ready.

Notes from a debugging session:

Some nodes may have a vast amount of iptables rules, making each iptables
command take a few seconds (~5-10s). At startup, when the weave container
creates its chains, it runs quite a few iptables commands and the total time of
execution was > 30s.

It's important to note that, at this point, the weave process hasn't started as
those iptables commands are run by an initial script.

Kubernetes was trying to check for the container liveness, was failing, and
then proceeded to kill the weave pods before they had a chance to fully start.
@dlespiau dlespiau force-pushed the 2018-09-28-moar-initialDelaySeconds branch from 64a8b77 to 8d44f49 Compare September 28, 2018 14:52
@dlespiau dlespiau changed the title kubernetes: Bump initialDelaySeconds of the livenessProbe to 2m kubernetes: Morph the livenessProbe into readinessProbe Sep 28, 2018
httpGet:
host: 127.0.0.1
path: /status
port: 6784
initialDelaySeconds: 30

This comment was marked as abuse.

This comment was marked as abuse.

@brb brb added this to the 2.5 milestone Oct 1, 2018
@brb
Copy link
Contributor

brb commented Oct 1, 2018

We should update Launch Generator too.

@dlespiau
Copy link
Contributor Author

dlespiau commented Oct 1, 2018

I have a PR updating launch-generator accordingly but didn't mention it here as it's not part of the open source project.

@bboreham bboreham merged commit 965b5c6 into master Oct 16, 2018
@bboreham bboreham deleted the 2018-09-28-moar-initialDelaySeconds branch December 24, 2018 12:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants