Exec liveness/readiness probes are no longer outputting messages #10424

mwringe · 2016-08-15T20:42:34Z

Lifecycle hooks (eg liveness and readiness probes) are no longer outputting the error message in the events lists. This makes things very difficult to debug when a lifecycle hook has failed.

Version

master

mfojtik · 2016-08-16T12:34:48Z

@mwringe are you referring to deployment hooks (pre/mid/post) and that they are not reporting errors to events OR you are referring to liveness and readiness probes not being reported in events?

mwringe · 2016-08-16T13:25:26Z

@mfojtik in my case its for liveness and readiness probes. these used to output the message from the probe, but now its empty. We have messages as to why the probe failed, which is useful to determine what the problem is and how to fix it. Without these messages, its can be very difficult to figure out what went wrong.

ncdc · 2016-08-16T18:14:55Z

@mwringe steps to reproduce please? Sample DC w/probe?

ncdc · 2016-08-16T18:52:45Z

I tested with an http probe and I see the reason:

vagrant@localhost:~/go/src/github.com/openshift/origin (master) oc get ev|grep probe
1m         4m          9         n1-2-t95v1              Pod                     spec.containers{n1}           Warning   Unhealthy                 {kubelet localhost.localdomain}      Liveness probe failed: HTTP probe failed with statuscode: 404

I chatted with @mwringe on irc and he said he's using an exec probe. Exec probes just return whatever combined stdout/stderr output is returned from the exec call. So if it returns nothing, but the exit code is nonzero, you would see "Liveness probe failed: "

Lowering priority. Will reraise if there's an actual issue.

mwringe · 2016-08-16T19:31:21Z

Just a note from the conversation we did indeed determine that there is a problem here which needs to be resolved.

If we increase the logging we can see that the probe returned a message:
I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n"

And yet when we check the events, we are shown an empty result:
54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed:

We may want to increase back the priority as this is indeed a problem.

ncdc · 2016-08-16T20:01:18Z

Back to P1. I will look at this tonight.

ncdc · 2016-08-16T20:08:55Z

I can reproduce

ncdc · 2016-08-16T21:18:25Z

Found the root cause. Will submit an upstream PR + a cherry-pick tonight.

mwringe · 2016-08-16T21:25:57Z

@ncdc wow, awesome. thanks.

ncdc · 2016-08-17T01:52:22Z

Upstream fix: kubernetes/kubernetes#30731

ncdc · 2016-08-17T14:00:37Z

Replaced by https://bugzilla.redhat.com/show_bug.cgi?id=1367204

@mwringe

Automatic merge from submit-queue Always return command output for exec probes and kubelet RunInContainer Always return command output for exec probes and kubelet RunInContainer, even if the command invocation returns nonzero. When #24921 replaced RunInContainer with ExecInContainer, it introduced a change where an exec probe that failed no longer included the stdout/stderr from the probe in the event. For example, when running at log level 4, you see: ``` I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n" ``` But the event looks like this: ``` 54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed: ``` Note the absence of the exec probe response after "Readiness probe failed". This PR restores the previous behavior. cc @kubernetes/rh-cluster-infra @mwringe xref openshift/origin#10424

danmcp added priority/P1 component/apps labels Aug 16, 2016

danmcp assigned mfojtik Aug 16, 2016

mfojtik assigned ncdc and unassigned mfojtik Aug 16, 2016

ncdc changed the title ~~Livecycle hooks are not longer outputting messages.~~ Liveness/readiness probes are no longer outputting messages Aug 16, 2016

ncdc added priority/P3 and removed priority/P1 labels Aug 16, 2016

ncdc added priority/P1 and removed priority/P3 labels Aug 16, 2016

ncdc mentioned this issue Aug 17, 2016

Always return command output for exec probes and kubelet RunInContainer kubernetes/kubernetes#30731

Merged

ncdc added component/kubernetes vendor-update Touching vendor dir or related files and removed component/apps labels Aug 17, 2016

ncdc changed the title ~~Liveness/readiness probes are no longer outputting messages~~ Exec liveness/readiness probes are no longer outputting messages Aug 17, 2016

ncdc closed this as completed Aug 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exec liveness/readiness probes are no longer outputting messages #10424

Exec liveness/readiness probes are no longer outputting messages #10424

mwringe commented Aug 15, 2016

mfojtik commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 17, 2016

ncdc commented Aug 17, 2016

Exec liveness/readiness probes are no longer outputting messages #10424

Exec liveness/readiness probes are no longer outputting messages #10424

Comments

mwringe commented Aug 15, 2016

Version

mfojtik commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

ncdc commented Aug 16, 2016

mwringe commented Aug 16, 2016

ncdc commented Aug 17, 2016

ncdc commented Aug 17, 2016