Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition in /health/service #279

Closed
chrisDeFouRire opened this issue Aug 10, 2014 · 5 comments
Closed

race condition in /health/service #279

chrisDeFouRire opened this issue Aug 10, 2014 · 5 comments

Comments

@chrisDeFouRire
Copy link

I think I've found a race condition in consul/health/service...

Here's the scenario:

  • start a service with a ttl check
  • curl /v1/health/service/serviceId with waitand index
  • stop service

When the /health/service endpoint answers back, curl it again... Compare the two...

What I've found: Status is 'passing' in the first curl, 'critical' in the second one, but the X-Consul-Index header has the same value in both.

Here's an example:

$ curl -v 'localhost:8500/v1/health/service/cotc-rocket?wait=30s&index=20549' ; curl -v 'localhost:8500/v1/health/service/cotc-rocket'

> GET /v1/health/service/cotc-rocket?wait=30s&index=20549 HTTP/1.1
...
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 20552
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Sun, 10 Aug 2014 09:22:40 GMT
< Content-Length: 660
< 
[{"Node":{"Node":"precise64","Address":"10.0.2.15"},"Service":{"ID":"cotc-rocket","Service":"cotc-rocket","Tags":[""],"Port":0},"Checks":[{"Node":"precise64","CheckID":"service:cotc-rocket","Name":"Service 'cotc-rocket' check","Status":"passing","Notes":"","Output":"{\"loadavg\":[1.263671875,1.5498046875,1.72802734375],\"process_uptime\":1,\"mem\":{\"rss\":76640256,\"heapTotal\":59255808,\"heapUsed\":37173608},\"cpucount\":8}","ServiceID":"cotc-rocket","ServiceName":"cotc-rocket"},{"Node":"precise64","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":""}]}]
...
> GET /v1/health/service/cotc-rocket HTTP/1.1
...
> 
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 20552
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Sun, 10 Aug 2014 09:22:40 GMT
< Content-Length: 510
< 
[{"Node":{"Node":"precise64","Address":"10.0.2.15"},"Service":{"ID":"cotc-rocket","Service":"cotc-rocket","Tags":[""],"Port":0},"Checks":[{"Node":"precise64","CheckID":"service:cotc-rocket","Name":"Service 'cotc-rocket' check","Status":"critical","Notes":"","Output":"TTL expired","ServiceID":"cotc-rocket","ServiceName":"cotc-rocket"},{"Node":"precise64","CheckID":"serfHealth","Name":"Serf Health Status","Status":"passing","Notes":"","Output":"Agent alive and reachable","ServiceID":"","ServiceName":""}]}]

It clearly show that both requests returned the same X-Consul-Index, but with different results (service cotc-rocket went from passing to critical). It happens most of the time, but sometimes both checks are reported as critical (as they should)... which made me speak of a race condition.

This was tested against both 0.3.0 and 0.3.1, with a single Consul server (dev environment).

Can you help?

@armon
Copy link
Member

armon commented Aug 11, 2014

This is interesting. I will look into it today.

@armon
Copy link
Member

armon commented Aug 22, 2014

Struggling to reproduce this. Do you have any scripts to make this happen?

@armon
Copy link
Member

armon commented Aug 22, 2014

Fixed by 3330956!

@armon armon closed this as completed Aug 22, 2014
@chrisDeFouRire
Copy link
Author

Thank you ! I'll check it out when it's delivered ! Any idea when the next version will be released ?

@armon
Copy link
Member

armon commented Aug 24, 2014

Hopefully in the next few weeks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants