Replies: 4 comments
-
Interesting question, and while I'm not on the leading edge of distributed cloud-native app development, I've not seen that in the wild (apps that run their own healthchecks w/o outside probing). I would think it's harder to build something inside an app that's running on an interval in the app that will kill the app if it detects a problem. A few additional dilemmas come up in that scenario. These thoughts are in no particular order and more of random ideas, I hope they help:
I hope all that helps :) |
Beta Was this translation helpful? Give feedback.
-
@BretFisher Thanks so much for the thorough response. This really clears things up for me! |
Beta Was this translation helpful? Give feedback.
-
Hi @BretFisher, thanks for the feedback to @geekdave - he and I are the ones debating this issue. I'd like your opinion on a more concrete example - we are thinking of deploying git-sync (e.g. https://hub.docker.com/r/openweb/git-sync/) where there are no ports to probe, etc. For monitoring, I was thinking that it would be sufficient to periodically check whether the container is still running (assuming the underlying code will exit when there is an error such as invalid login) vs. trying to write a side-car process that would listen on a TCP port and provide health info (the latter seems like extra overhead). But I am more than willing to be convinced otherwise. FYI, we are very early in our Docker migration and are currently managing containers via |
Beta Was this translation helpful? Give feedback.
-
Thanks for being a fan! Kube is on the roadmap of upcoming courses for sure, expect to see announcements in the coming weeks on my plans. I don't know about this git-sync thing, it smells of anti-patterns. Hopefully, it's not syncing code, as that goes against the idea of building a docker image from a commit ID that is a direct artifact of the code, which can be deployed and guaranteed to match that commit... Anyway, "checking if the container is running" is exactly what Swarm and Kubernetes do, so I wouldn't bother with making your own tool. What you'd need is a monitoring system that tracks orchestrator events and alerts you for things you care about. The orchestrators' job is to ensure your service is available and deal with containers and other objects to meet your declarative service definitions. That'll make more sense once you're through Swarm sections of the course. |
Beta Was this translation helpful? Give feedback.
-
Following the philosophy of "a container is just a process", my team got to discussing how a running container should diagnose its own health, and what it should do once it's decided that it's unhealthy.
Going with the convention of a
/healthz
endpoint, when a container is asked about its health, it should respond with a2xx
if it's healthy, and with something else if it's not.Our main question is:
If a process in a container realizes that it's not healthy, should the container just sit there unhealthy waiting for an orchestrator to terminate it, or should it "self-terminate" by exiting its own process, and causing the docker daemon to respawn it?
If the answer to that is "yes", then is the sole purpose of a health check to guard against the case when a process in a container is fully "hung" and unable to respond to health checks at all? If a container returned a 500 result for a health check should it instead have just taken the liberty of terminating itself?
Beta Was this translation helpful? Give feedback.
All reactions