Calculate and return best status for logical services, nodes, etc #802

sethvargo · 2015-03-19T16:37:00Z

Since this has come up a few times in different projects, I'm going to raise an issue for discussion. When determining a node's, service's, or logical service's status, one must do something like:

func statusFromChecks(checks []*api.HealthCheck) (string, error) {
    var passing, warning, unknown, critical bool
    for _, check := range checks {
        switch check.Status {
        case "passing":
            passing = true
        case "warning":
            warning = true
        case "unknown":
            unknown = true
        case "critical":
            critical = true
        default:
            return "", fmt.Errorf("unknown status: %q", check.Status)
        }
    }

    switch {
    case critical:
        return "critical", nil
    case unknown:
        return "unknown", nil
    case warning:
        return "warning", nil
    case passing:
        return "passing", nil
    default:
        // No checks?
        return "passing", nil
    }
}

TL;DR - iterate over each check and & them together to get the "best" status for the node/service/logical service.

I would like to propose that Consul does this calculation itself and exposes that result via the API and struct fields. It would be great if Consul could aggregate those checks into a single status. This would reduce a lot of duplication in our tooling and I think it would provide a better experience.

Thoughts @armon @ryanuber?

The text was updated successfully, but these errors were encountered:

ryanuber · 2015-03-19T23:02:28Z

@sethvargo this seems reasonable to me - perhaps an additional field in the health responses with an aggregate status would do the trick so existing clients continue to work normally. Nice example 👍

armon · 2015-03-20T18:12:31Z

This could simplify some internal complexity as well, since I think there is at least 2 distinct places we calculate this.

sethvargo · 2016-09-15T08:46:27Z

Ping @slackpad

slackpad · 2016-09-15T12:56:32Z

We need to do this! Similar for the address of a service - we need to fill in the final address and not make the user look for a service address, otherwise use the node's address.

sethvargo · 2016-09-15T14:30:08Z

Could we dump this in a milestone for scheduling @slackpad. Seems high-value with little effort 😄

beardedeagle · 2017-06-13T07:15:24Z

Bump. Currently I have implemented this myself in my service check script. This would be great to have natively though.

sethvargo · 2017-06-29T09:07:08Z

Apparently I did this: 4179aac

This endpoint aggregate all checks related to <service id> on the agent and return an appropriate http code + the string describing the worst check. This allows to cleanly expose service status to other component, hiding complexity of multiple checks. This is especially useful to use consul to feed a loadbalancer which would deleguate healthchecking to consul agent. Exposing this endpoint on the agent is necessary to avoid a hit on consul servers and avoid decreasing resiliency (this endpoint will work even if there is no consul leader in the cluster). Fix hashicorp#2488, relates to hashicorp#802 Change-Id: Ib340c62bbbba46fd4256ed31474d8ffb1762d4df Signed-off-by: Grégoire Seux <g.seux@criteo.com>

* Create templates for grafana and prometheus

ryanuber added the type/enhancement Proposed improvement or new feature label Mar 19, 2015

slackpad self-assigned this Sep 15, 2016

sethvargo mentioned this issue Nov 29, 2016

Add an API method for determining the best status #2544

Merged

slackpad added this to the 0.7.3 milestone Nov 29, 2016

slackpad modified the milestones: 0.7.4, 0.7.3 Jan 17, 2017

slackpad removed this from the Triaged milestone Apr 18, 2017

This was referenced May 3, 2017

This mark checks as critical if the serfCheck fails. #2042

Closed

Warning healthcheck should also be passing #2452

Closed

slackpad added the theme/api Relating to the HTTP API interface label May 25, 2017

sethvargo closed this as completed Jun 29, 2017

kamaradclimber mentioned this issue Oct 7, 2017

Implement /v1/agent/health/service/<service name> endpoint #3551

Merged

duckhan pushed a commit to duckhan/consul that referenced this issue Oct 24, 2021

Create templates for grafana and prometheus (hashicorp#802)

c39b731

* Create templates for grafana and prometheus

snyk-bot mentioned this issue Jan 22, 2022

[Snyk] Upgrade nuka-carousel from 4.7.5 to 4.8.4 bigcommerce/consul#34

Closed

snyk-bot mentioned this issue Feb 28, 2023

[Snyk] Upgrade nuka-carousel from 4.7.5 to 4.8.4 qmutz/consul#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate and return best status for logical services, nodes, etc #802

Calculate and return best status for logical services, nodes, etc #802

sethvargo commented Mar 19, 2015

ryanuber commented Mar 19, 2015

armon commented Mar 20, 2015

sethvargo commented Sep 15, 2016

slackpad commented Sep 15, 2016

sethvargo commented Sep 15, 2016

beardedeagle commented Jun 13, 2017

sethvargo commented Jun 29, 2017

Calculate and return best status for logical services, nodes, etc #802

Calculate and return best status for logical services, nodes, etc #802

Comments

sethvargo commented Mar 19, 2015

ryanuber commented Mar 19, 2015

armon commented Mar 20, 2015

sethvargo commented Sep 15, 2016

slackpad commented Sep 15, 2016

sethvargo commented Sep 15, 2016

beardedeagle commented Jun 13, 2017

sethvargo commented Jun 29, 2017