Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exec liveness/readiness probes are no longer outputting messages #10424

Closed
mwringe opened this issue Aug 15, 2016 · 11 comments
Closed

Exec liveness/readiness probes are no longer outputting messages #10424

mwringe opened this issue Aug 15, 2016 · 11 comments
Assignees
Labels
component/kubernetes priority/P1 vendor-update Touching vendor dir or related files

Comments

@mwringe
Copy link
Contributor

mwringe commented Aug 15, 2016

Lifecycle hooks (eg liveness and readiness probes) are no longer outputting the error message in the events lists. This makes things very difficult to debug when a lifecycle hook has failed.

Version

master

@mfojtik
Copy link
Contributor

mfojtik commented Aug 16, 2016

@mwringe are you referring to deployment hooks (pre/mid/post) and that they are not reporting errors to events OR you are referring to liveness and readiness probes not being reported in events?

@mwringe
Copy link
Contributor Author

mwringe commented Aug 16, 2016

@mfojtik in my case its for liveness and readiness probes. these used to output the message from the probe, but now its empty. We have messages as to why the probe failed, which is useful to determine what the problem is and how to fix it. Without these messages, its can be very difficult to figure out what went wrong.

@mfojtik mfojtik assigned ncdc and unassigned mfojtik Aug 16, 2016
@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2016

@mwringe steps to reproduce please? Sample DC w/probe?

@ncdc ncdc changed the title Livecycle hooks are not longer outputting messages. Liveness/readiness probes are no longer outputting messages Aug 16, 2016
@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2016

I tested with an http probe and I see the reason:

vagrant@localhost:~/go/src/github.com/openshift/origin (master) oc get ev|grep probe
1m         4m          9         n1-2-t95v1              Pod                     spec.containers{n1}           Warning   Unhealthy                 {kubelet localhost.localdomain}      Liveness probe failed: HTTP probe failed with statuscode: 404

I chatted with @mwringe on irc and he said he's using an exec probe. Exec probes just return whatever combined stdout/stderr output is returned from the exec call. So if it returns nothing, but the exit code is nonzero, you would see "Liveness probe failed: "

Lowering priority. Will reraise if there's an actual issue.

@mwringe
Copy link
Contributor Author

mwringe commented Aug 16, 2016

Just a note from the conversation we did indeed determine that there is a problem here which needs to be resolved.

If we increase the logging we can see that the probe returned a message:
I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n"

And yet when we check the events, we are shown an empty result:
54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed:

We may want to increase back the priority as this is indeed a problem.

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2016

Back to P1. I will look at this tonight.

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2016

I can reproduce

@ncdc
Copy link
Contributor

ncdc commented Aug 16, 2016

Found the root cause. Will submit an upstream PR + a cherry-pick tonight.

@mwringe
Copy link
Contributor Author

mwringe commented Aug 16, 2016

@ncdc wow, awesome. thanks.

@ncdc
Copy link
Contributor

ncdc commented Aug 17, 2016

Upstream fix: kubernetes/kubernetes#30731

@ncdc ncdc changed the title Liveness/readiness probes are no longer outputting messages Exec liveness/readiness probes are no longer outputting messages Aug 17, 2016
@ncdc
Copy link
Contributor

ncdc commented Aug 17, 2016

@ncdc ncdc closed this as completed Aug 17, 2016
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Aug 20, 2016
Automatic merge from submit-queue

Always return command output for exec probes and kubelet RunInContainer

Always return command output for exec probes and kubelet RunInContainer, even if the command invocation returns nonzero.

When #24921 replaced RunInContainer with ExecInContainer, it introduced a change where an exec probe that failed no longer included the stdout/stderr from the probe in the event. For example, when running at log level 4, you see:

```
I0816 15:01:36.259826 29713 exec.go:38] Exec probe response: "Failed to access the status endpoint : HTTP Error 404: Not Found.\nHawkular metrics has only been running for 7\n seconds not aborting yet.\n"
```

But the event looks like this:

```
54s 22s 5 hawkular-metrics-hjme4 Pod spec.containers{hawkular-metrics} Warning Unhealthy {kubelet corbeau} Readiness probe failed:
```

Note the absence of the exec probe response after "Readiness probe failed". This PR restores the previous behavior.

cc @kubernetes/rh-cluster-infra @mwringe 

xref openshift/origin#10424
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/kubernetes priority/P1 vendor-update Touching vendor dir or related files
Projects
None yet
Development

No branches or pull requests

4 participants