elastic-agent docker: support healthcheck for the container #24503

mtojek · 2021-03-11T11:27:36Z

With the Fleet Server enabled in the agent's Docker container, we need to find a way to signal that the container is healthy. Before 7.13.0-SNAPSHOT we used the following healthcheck: https://github.com/elastic/elastic-package/blob/master/internal/install/static_snapshot_yml.go#L85

Do you have any recommendation how to signal that the container is healthy - it has a default policy assigned? Do you think you can add the healthcheck definition to the official Docker image?

elasticmachine · 2021-03-11T11:27:38Z

Pinging @elastic/agent (Team:Agent)

ruflin · 2021-03-16T11:51:38Z

@simitt How are you handling this for the Cloud container?

simitt · 2021-03-16T12:15:53Z

@ruflin no special handling for the healthcheck yet; instead, if the legacy APM Server cannot be started or dies, it sends a signal to terminate the Elastic Agent, resulting in the whole container being terminated.

For sub processes managed by Elastic Agent, we have discussed in the past that the Elastic Agent should provide a healthcheck endpoint providing details per sub process but also an overall health indicator.

ruflin · 2021-03-17T08:32:42Z

@michalpristas @ph Do we have this overall healthcheck already tracked somewhere? I remember we discussed this in the past. Does Agent already has some http endpoint or similar?

ph · 2021-03-17T13:03:11Z

We track health status internally and we should be able to expose it in any way necessary, @ruflin I believe this is linked to the #24091

mtojek · 2021-03-17T13:55:36Z

From our perspective (users) it's valuable if the healthcheck signals green if the default policy is assigned for the first time.

ruflin · 2021-03-31T07:52:45Z

I think an Agent should be healthy, as soon as the first policy is received and acked. This does not have to be the default policy.

We should improve this healthcheck later on to have more fine grained status information depending on processes / inputs status.

simitt · 2021-03-31T08:10:18Z

IMO if the agent is started in Fleet Server mode, then the Agent's health should consider the Fleet Server's health, ideally by consuming a health endpoint from the Fleet Server.

mtojek · 2021-03-31T08:30:51Z

As we're observing some flakiness with booting the agent I did a short exercise to check responses fro /api/status:

$ for I in `seq 1 1 10000`; do curl http://localhost:8220/api/status ; sleep 1; done
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
{"name":"fleet-server","version":"","status":"STARTING"}{"name":"fleet-server","version":"","status":"HEALTHY"}curl: (52) Empty reply from server
curl: (52) Empty reply from server
curl: (52) Empty reply from server
{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}{"name":"fleet-server","version":"","status":"HEALTHY"}ć{"name":"fleet-server","version":"","status":"HEALTHY"}

Please mind the gap between HEALTHY states. It seems that the Fleet Server got restarted then which means that we need a different workaround :)

mtojek added the Team:Elastic-Agent Label for the Agent team label Mar 11, 2021

mtojek mentioned this issue Mar 11, 2021

Elastic stack: support Fleet Server elastic/elastic-package#278

Closed

7 tasks

ph assigned michalpristas Mar 17, 2021

blakerouse mentioned this issue Apr 1, 2021

Add status subcommand to report status of running daemon. #24856

Merged

3 tasks

blakerouse closed this as completed in #24856 Apr 1, 2021

blakerouse mentioned this issue Apr 1, 2021

Cherry-pick #24856 to 7.x: Add status subcommand to report status of running daemon. #24905

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elastic-agent docker: support healthcheck for the container #24503

elastic-agent docker: support healthcheck for the container #24503

mtojek commented Mar 11, 2021

elasticmachine commented Mar 11, 2021

ruflin commented Mar 16, 2021

simitt commented Mar 16, 2021

ruflin commented Mar 17, 2021

ph commented Mar 17, 2021

mtojek commented Mar 17, 2021

ruflin commented Mar 31, 2021

simitt commented Mar 31, 2021

mtojek commented Mar 31, 2021

elastic-agent docker: support healthcheck for the container #24503

elastic-agent docker: support healthcheck for the container #24503

Comments

mtojek commented Mar 11, 2021

elasticmachine commented Mar 11, 2021

ruflin commented Mar 16, 2021

simitt commented Mar 16, 2021

ruflin commented Mar 17, 2021

ph commented Mar 17, 2021

mtojek commented Mar 17, 2021

ruflin commented Mar 31, 2021

simitt commented Mar 31, 2021

mtojek commented Mar 31, 2021