Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http service check fails every third request - causes service to flap #779

Closed
darron opened this issue Mar 13, 2015 · 8 comments
Closed
Labels
type/bug Feature does not function as expected

Comments

@darron
Copy link
Contributor

darron commented Mar 13, 2015

I have been noticing a ton of http requests failing using the http service check instead of the script based service check - here's some logs from a clean node:

https://gist.github.com/darron/d4100cb1dfb8b8bf3fce

What's strange is that every third check fails - for each service I register using the new http type service check - which is causing nginx to flap as Consul Template re-writes the template over and over.

Once I changed the service check to use the curl based check - at the same frequency - not a single failure.

Here's the logs from the times we were setting the service and checks:

Mar 12 11:59:14 octohost octohost: service:set '{"ID": "html-49153","Name": "html","Port": 49153,"Tags": ["http"],"check": {"http": "http://104.236.236.162:49153","interval": "15s"}}'
Mar 12 18:18:17 octohost octohost: service:set '{"ID": "interactions-49154","Name": "interactions","Port": 49154,"Tags": ["http"],"check": {"http": "http://104.236.236.162:49154","interval": "15s"}}'
Mar 12 18:26:37 octohost octohost: service:set '{"ID": "interactions-49155","Name": "interactions","Port": 49155,"Tags": ["http"],"check": {"http": "http://10.132.82.182:49155","interval": "15s"}}'
Mar 12 18:34:25 octohost octohost: service:set '{"ID": "interactions-49156","Name": "interactions","Port": 49156,"Tags": ["http"],"check": {"script": "curl -s http://10.132.82.182:49156","interval": "15s"}}'

Here's the bash where we're setting the service:

https://github.com/octohost/octohost/blob/master/bin/octo#L396-L426

After I adjusted the definition for the last one at 18:34 - no problems at all - no more flapping. The underlying nginx container seems to be stable.

NOTE: I have duplicated this on Digital Ocean and GCE nodes.

@ryanuber
Copy link
Member

Hey @darron, thanks for the report. This is sounding like it might be a keep-alive issue. Is the server you are talking to configured with a keep-alive timeout of < 30s? The default Go HTTP client uses a 30s keep-alive. We probably need to adjust this on the Consul side, but I just want to nail the issue down further before going down that road. Also, if you could provide the check configuration in its original JSON form, and the curl command you used, that would be helpful.

Tagging as a bug, Thanks!

@ryanuber ryanuber added the type/bug Feature does not function as expected label Mar 13, 2015
@armon
Copy link
Member

armon commented Mar 13, 2015

@ryanuber We should just disable keep-alive, it seems sane for doing localhost checks to just re-dial.

@ryanbreen
Copy link
Contributor

@armon +1

@darron
Copy link
Contributor Author

darron commented Mar 13, 2015

It's a really standard nginx container - I'm just away from my laptop and
will get full config when I get back.
On Fri, Mar 13, 2015 at 11:53 AM Ryan Breen notifications@github.com
wrote:

@armon https://github.com/armon +1


Reply to this email directly or view it on GitHub
#779 (comment).

@darron
Copy link
Contributor Author

darron commented Mar 13, 2015

FYI - @ryanuber - this is the config:

https://github.com/octohost/nginx/blob/master/nginx.conf - 15 second keepalive timeout.

Was setting this JSON:

{"ID": "interactions-49155","Name": "interactions","Port": 49155,"Tags": ["http"],"check": {"http": "http://10.132.82.182:49155","interval": "15s"}}

With this command - $2 is the JSON that's passed:

curl -s -X PUT -d "$2" "localhost:8500/v1/agent/service/register?token=anonymous"

It goes from here: https://github.com/octohost/octohost/blob/master/bin/octo#L421

To here:

https://github.com/octohost/octohost-cookbook/blob/master/files/default/consulkv#L39-L41

I have all sorts of containers though - so likely what @armon and @ryanbreen +1'd would be the best - it would be a bit of a futile process to make sure everything had a long enough keepalive.

Thanks for taking a look at this!

@ryanuber
Copy link
Member

Hey @darron - I just pushed 952ec28, which should disable HTTP keep-alive's. If you have a minute, give master a test and see if the issue persists. I'll leave this open for now until we confirm the fix. Thanks!

@darron
Copy link
Contributor Author

darron commented Mar 16, 2015

Sweet - I compiled from master and it doesn't error anymore. Thanks!

@ryanuber
Copy link
Member

Thanks for confirming! Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants