Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query returns service (and checks still pass) after node was terminated #811

Closed
discordianfish opened this issue Mar 24, 2015 · 6 comments
Closed

Comments

@discordianfish
Copy link
Contributor

Hi,

I'm running a consul cluster with 3 server nodes and several client nodes.
Some clients run consul with such json service definition:

{
  "service": {
    "name": "infra-haproxy-stats",
    "port": 8000,
    "check": {
      "script": "curl -o /dev/null localhost:8000",
      "interval": "60s"
    }
  }
}

Now if I shutdown such node, the serf check fails but the service check still passes:
serf_failed_check_pass

And most importantly, consul still returns those nodes in queries:

$ curl consul:8500/v1/catalog/service/infra-haproxy-stats | jq '.[]|select(.Address == "10.128.13.32")'
{
  "Node": "ip-10-128-13-32",
  "Address": "10.128.13.32",
  "ServiceID": "infra-haproxy-stats",
  "ServiceName": "infra-haproxy-stats",
  "ServiceTags": null,
  "ServiceAddress": "",
  "ServicePort": 8000
}

I'm not sure if the passing service check is due to design (since the check never reported 'down' although I expected it to be marked failed if there is no 'ok' in some time), but at least I expected consul to not return unhealthy nodes.

@discordianfish discordianfish changed the title Query return service and checks still pass after node was terminated Query returns service (and checks still pass) after node was terminated Mar 24, 2015
@grobie
Copy link

grobie commented Mar 24, 2015

Do you use leave on terminate? For an intentional shut down of a node, you probably want to make sure it also leaves the cluster.

For anything health related, I believe you should use the Health API instead. https://consul.io/docs/agent/http/health.html.

@pearkes
Copy link
Contributor

pearkes commented Mar 24, 2015

I'll note that this is a known UI issue, and expected per the Consul API. The UI should handle nodes that aren't responding to the serfHealth check in a special case to mark them as unreachable.

@ryanuber
Copy link
Member

This is expected behavior. The script/interval check is run locally on the agent, so if that node goes away, the result will not be updated, which was indeed a design decision. This is where the serfHealth check smooths things over for you by quickly detecting the node failure and updating the catalog. As pointed out by @grobie, you will want to use the /v1/health endpoint to query for services in a passing state. The equivalent API call in your use case would have been curl consul:8500/v1/health/service/infra-haproxy-stats?passing.

@ryanuber
Copy link
Member

Created #813 for the UI issue, let's track that separately.

@discordianfish
Copy link
Contributor Author

Ok, got it - but is it also expected for the dns api to return unhealthy nodes? I think I saw that happen but will verify.

@ryanuber
Copy link
Member

@discordianfish definitely not - the DNS interface should only return healthy results. Please do let us know if you see otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants