Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate and return best status for logical services, nodes, etc #802

Closed
sethvargo opened this issue Mar 19, 2015 · 7 comments
Closed

Calculate and return best status for logical services, nodes, etc #802

sethvargo opened this issue Mar 19, 2015 · 7 comments
Assignees
Labels
theme/api Relating to the HTTP API interface type/enhancement Proposed improvement or new feature

Comments

@sethvargo
Copy link
Contributor

Since this has come up a few times in different projects, I'm going to raise an issue for discussion. When determining a node's, service's, or logical service's status, one must do something like:

func statusFromChecks(checks []*api.HealthCheck) (string, error) {
    var passing, warning, unknown, critical bool
    for _, check := range checks {
        switch check.Status {
        case "passing":
            passing = true
        case "warning":
            warning = true
        case "unknown":
            unknown = true
        case "critical":
            critical = true
        default:
            return "", fmt.Errorf("unknown status: %q", check.Status)
        }
    }

    switch {
    case critical:
        return "critical", nil
    case unknown:
        return "unknown", nil
    case warning:
        return "warning", nil
    case passing:
        return "passing", nil
    default:
        // No checks?
        return "passing", nil
    }
}

TL;DR - iterate over each check and & them together to get the "best" status for the node/service/logical service.

I would like to propose that Consul does this calculation itself and exposes that result via the API and struct fields. It would be great if Consul could aggregate those checks into a single status. This would reduce a lot of duplication in our tooling and I think it would provide a better experience.

Thoughts @armon @ryanuber?

@ryanuber
Copy link
Member

@sethvargo this seems reasonable to me - perhaps an additional field in the health responses with an aggregate status would do the trick so existing clients continue to work normally. Nice example 👍

@ryanuber ryanuber added the type/enhancement Proposed improvement or new feature label Mar 19, 2015
@armon
Copy link
Member

armon commented Mar 20, 2015

This could simplify some internal complexity as well, since I think there is at least 2 distinct places we calculate this.

@sethvargo
Copy link
Contributor Author

Ping @slackpad

@slackpad slackpad self-assigned this Sep 15, 2016
@slackpad
Copy link
Contributor

We need to do this! Similar for the address of a service - we need to fill in the final address and not make the user look for a service address, otherwise use the node's address.

@sethvargo
Copy link
Contributor Author

Could we dump this in a milestone for scheduling @slackpad. Seems high-value with little effort 😄

@slackpad slackpad added this to the 0.7.3 milestone Nov 29, 2016
@slackpad slackpad modified the milestones: 0.7.4, 0.7.3 Jan 17, 2017
@slackpad slackpad removed this from the Triaged milestone Apr 18, 2017
@slackpad slackpad added the theme/api Relating to the HTTP API interface label May 25, 2017
@beardedeagle
Copy link

Bump. Currently I have implemented this myself in my service check script. This would be great to have natively though.

@sethvargo
Copy link
Contributor Author

Apparently I did this: 4179aac

kamaradclimber added a commit to criteo-forks/consul that referenced this issue Dec 5, 2017
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
fboismenu pushed a commit to criteo-forks/consul that referenced this issue Jan 23, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
kamaradclimber added a commit to criteo-forks/consul that referenced this issue Jan 23, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
fboismenu pushed a commit to criteo-forks/consul that referenced this issue Feb 15, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
ShimmerGlass pushed a commit to criteo-forks/consul that referenced this issue Nov 9, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
ShimmerGlass pushed a commit to criteo-forks/consul that referenced this issue Nov 9, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
ShimmerGlass pushed a commit to criteo-forks/consul that referenced this issue Dec 3, 2018
This endpoint aggregate all checks related to <service id> on the agent
and return an appropriate http code + the string describing the worst
check.

This allows to cleanly expose service status to other component, hiding
complexity of multiple checks.
This is especially useful to use consul to feed a loadbalancer which
would deleguate healthchecking to consul agent.

Exposing this endpoint on the agent is necessary to avoid a hit on
consul servers and avoid decreasing resiliency (this endpoint will work
even if there is no consul leader in the cluster).

Fix hashicorp#2488, relates to hashicorp#802

Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df
Signed-off-by: Grégoire Seux <g.seux@criteo.com>
duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021
* Create templates for grafana and prometheus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api Relating to the HTTP API interface type/enhancement Proposed improvement or new feature
Projects
None yet
Development

No branches or pull requests

5 participants