-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate and return best status for logical services, nodes, etc #802
Comments
@sethvargo this seems reasonable to me - perhaps an additional field in the health responses with an aggregate status would do the trick so existing clients continue to work normally. Nice example 👍 |
This could simplify some internal complexity as well, since I think there is at least 2 distinct places we calculate this. |
Ping @slackpad |
We need to do this! Similar for the address of a service - we need to fill in the final address and not make the user look for a service address, otherwise use the node's address. |
Could we dump this in a milestone for scheduling @slackpad. Seems high-value with little effort 😄 |
Bump. Currently I have implemented this myself in my service check script. This would be great to have natively though. |
Apparently I did this: 4179aac |
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>
* Create templates for grafana and prometheus
Since this has come up a few times in different projects, I'm going to raise an issue for discussion. When determining a node's, service's, or logical service's status, one must do something like:
TL;DR - iterate over each check and
&
them together to get the "best" status for the node/service/logical service.I would like to propose that Consul does this calculation itself and exposes that result via the API and struct fields. It would be great if Consul could aggregate those checks into a single status. This would reduce a lot of duplication in our tooling and I think it would provide a better experience.
Thoughts @armon @ryanuber?
The text was updated successfully, but these errors were encountered: