Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create health check / readiness endpoint #46984

Closed
tylersmalley opened this issue Sep 30, 2019 · 12 comments
Closed

Create health check / readiness endpoint #46984

tylersmalley opened this issue Sep 30, 2019 · 12 comments
Labels
enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@tylersmalley
Copy link
Contributor

tylersmalley commented Sep 30, 2019

When users are deploying Kibana into production, they need an endoint the load-balancer can use to determine if the node should be in the pool. Up until now, folks have use the /status endpoint in conjunction with status.allowAnonymous. This is not ideal, as it makes the status page public unless measures are taken to prevent access at the LB. We should create an endpoint for this purpose which uses the correct status codes.

Implementation Scope

This endpoint should answer the operational question: "is this instance healthy enough to send HTTP traffic to it?"

In light of that, we should probably ignore any custom plugin statuses and only depend on:

  • Elasticsearch connection
  • SavedObject migrations being completed
  • Is the main HTTP server running

Response format when unavailable

HTTP 503
{ "status": "unavailable" }

Response format when available

HTTP 200
{ "status": "available" }

Related questions

  • How many users enable status.allowAnonymous?
@tylersmalley tylersmalley added the Team:Operations Team label for Operations Team label Sep 30, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

@epixa epixa added the enhancement New value added to drive a business result label Sep 30, 2019
@tylersmalley
Copy link
Contributor Author

ES is tracking the addition of a readiness check here

@tylersmalley tylersmalley added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Mar 30, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

@wasserman
Copy link

Maybe it could just piggyback on /api/status and have an additional argument for load-balancing use where the HTTP code will be non-200 upon yellow or red. I understand the general desire to honor the HTTP status for its original purpose, but smarter endpoints that can be used for load-balancing allow customers to route users away from unhealthy Kibana nodes. Maybe a toggle can give us the best of both worlds? Then if the LB is skipping a node you can still troubleshoot by hitting /api/status to look at the detailed response.

@pgayvallet
Copy link
Contributor

As the endpoint should be accessible as fast as possible, we would probably need to also add it to the 'notReady' internal server:

private async runNotReadyServer(config: HttpConfig) {

@kobelb
Copy link
Contributor

kobelb commented Mar 11, 2021

The @elastic/kibana-alerting-services team recently implemented logic to update the status to red when task-manager is misbehaving. I think we should be cautious when creating this API to make sure it doesn't result in a load-balancer taking a Kibana instance offline just because a single plugin is partially unavailable. Especially since Kibana has so many of the UIs that are helpful for diagnosing the health of the stack.

@tylersmalley
Copy link
Contributor Author

@kobelb, does setting a plugin status to red still force the status page, making Kibana un-usable? If so, I am wondering if task-manager using that lever is what should be changed. Tasks not working should be brought to the user's attention, but I am not sure it should make Kibana itself unusable, as it's often what's needed for the investigation.

@kobelb
Copy link
Contributor

kobelb commented Mar 18, 2021

@tylersmalley based on my limited testing, it does not. I recall Kibana doing this at some point, but I'm not seeing it happen any longer.

@pgayvallet
Copy link
Contributor

does setting a plugin status to red still force the status page, making Kibana un-usable

It doesn't. It doesn't really do anything by default if I remember correctly. We are supposed to provide utilities to block routes from red plugins, but it's not done atm.

@lukeelmers
Copy link
Member

Next steps for this:

  • Check in with Cloud / ECK teams
  • Check with ES around their plans in this area

cc @thesmallestduck

@lukeelmers lukeelmers changed the title Create health check endpoint Create health check / readiness endpoint Feb 4, 2022
@lukeelmers
Copy link
Member

Copying over some of the use cases from duplicate issue #148511:

Describe the feature:

While I am familiar with /api/status it returns a 200 check even while service is degraded and metric beat or other connections will fail.

Would it be possible to get an HTTP endpoint that will not return a 200 OK until Kibana is actually ready to serve traffic. Having load balancers parse the output from /api/status is not a feature available to most of them to make sure that Kibana is not degraded.

Describe a specific use case for the feature:

  • Healtcheck in a docker-compose setup so that metric beat won't get started until Kibana is ready for it (metric beat currently exits with status code 1 if it fails to connect to Kibana)
  • Healthcheck for ALB/load balancers so that Kibana instances don't get traffic until they are actually ready to serve said traffic (so users don't get 503 service unavailable or search errors)

@rudolf
Copy link
Contributor

rudolf commented Aug 24, 2023

Closed by #159768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
Status: Done (7.13)
Development

No branches or pull requests

8 participants