-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Report runtime health checks into Integration readiness condition #2719
Conversation
While that PR proposes to use the Camel readiness checks as a channel / interface to peek into the runtime status, I've identified the following open points / gaps:
@lburgazzoli @nicolaferraro @jamesnetherton @davsclaus would you be kind to shed your expertise / opinion on this? |
Yeah, it'd be nice to implement that. Should be a simple enough enhancement. |
This is a long standing issue and I think we need to start adding health check at component level
This needs to be fixed too Mind opening some issue on camel ? |
Yes whether a route can startup vs the consumer is connected is component specific. Some components have built in their own recovery so they startup the route, and then will automatic self-heal / failover etc. Such as JMS, SQL etc. And some components does not and fail starting the route if so. As Luca says then very likely the best solution is to add component level health check, so we can add the logic needed per component (we can have a default readiness that is based on consumer is started). |
There are some tickets already for component level health checks And also some way of checking the caused error in the consumer if its connectivity error or a business error |
@astefanutti for the 1st bullet we need a JIRA ticket about this. Then maybe @jamesnetherton can take a look, seems like we can copy over the message/error to MP. There are also some other details such as
And then some general information for counters, eg number of checks and failures in row etc. See the base class source code |
@davsclaus, thanks a lot, these tickets capture exactly what would be needed 👍🏼. I've created CAMEL-17138 to track the propagation of the Camel health check result details into the MP Health responses. |
I'm thinking to a multi-tenant cluster with strict network policies that disallow cross-namespace connections, and a Camel K operator deployed globally. It seems from the code that this would result in a kind of "health unavailable" (if the connection error is catched). Maybe tunneling the request e.g. via the apiserver proxy could make it work in any configuration. Wdyt @astefanutti ? |
Ah right, that's a very good point. I took the shortest path and mimicked what the kubelet does, but the operator Pod is indeed subjected to network policies. Let me rework it based on the API server proxy. Ultimately, it would be possible to have the reason reported into the Pod readiness condition directly, alike the termination message, but in the interim, relying on the API server proxy should do it. |
Okay have a prototype for camel health-check in camel-telegram that reports it as DOWN or UP You can set the threshold in the standard way today, so either global with a * or by route id, (pattern) camel.health.config[*].failure-threshold = 10 or via route id camel.health.config[myRoute].failure-threshold = 10 There is no threshold by default, so we may consider something special for this, or let camel-k auto assign a default value or something. |
Wow that was fast! There is also @Croway that has a PoC for propagating the error details in camel-microprofile-health.
Yes, I think we could have a health trait that would provide users the ability to configure theses, and auto assign sensible defaults there. I think encapsulating the health configuration into a dedicated trait would also help disentangling the container trait. Also it could be useful to have a |
Good idea about a trait for health checks. This can then make configuring this easier for end users. The success threshold is a nice touch, as today when a route/consumer is successful again after 1 attempt its UP. We can add similar threshold as we have for failure. |
I've updated the logic to call the health probe via the API server proxy. |
Another ticket to allow to control Camel to auto stop un-healthy routes |
I think it's ready. We'll be able to iterate as soon as we upgrade to Camel 3.13+, to leverage the new features developed by @davsclaus, add more test cases, and fix bugs 😇. |
This PR probes the readiness checks exposed by the Camel runtime (Camel Quarkus / MicroProfile Health / SmallRye Health), to reconcile the Integration phase and readiness condition. It aims at surfacing the response from the Camel readiness checks, and exposing useful information, like error messages, so that the Integration status can serve as a single interface for higher level controllers.
As details from the readiness probe responses are not accessible from the Pod(s) status, the readiness probes are directly called by the operator (via the API server proxy), each time the Integration Pod(s) readiness condition change.
This follows up #2682, so that readiness / error status reconciliation now covers the entire lifecycle of an Integration.
TODO:
Release Note