Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay return of control during startup #127

Merged
merged 1 commit into from
Jul 13, 2015

Conversation

jhmartin
Copy link
Contributor

@jhmartin jhmartin commented Feb 9, 2015

Give consul time to install its HUP signal handler to avoid immediate shutdown when invoked via knife ssh, fixes #125.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling a11891d on jhmartin:125_Startup-delay into 57b4105 on johnbellone:master.

@tgwizard
Copy link
Contributor

tgwizard commented Feb 9, 2015

Should this really be part of the init script and not the chef setup process?

@jhmartin
Copy link
Contributor Author

@tgwizard See hashicorp/consul#677 ; the issue occurs when called outside of chef. It just so happens that knife ssh ... /etc/init.d/consul start also triggers the behavior.

@tgwizard
Copy link
Contributor

@jhmartin ah, ok. Still, this will affect starts and stops when not running through knife ssh etc., slowing everything down. Wouldn't

knife ssh ... "/etc/init.d/consul start && sleep 3"

work for you?

@jhmartin
Copy link
Contributor Author

@tgwizard I suppose it would, but that means other users also have to know about this behavior and think about each invocation mechanism (Rundeck, mcollective, ssh-for-loop, fabric, ansible, etc) and determine if it requires a timing-safe wrapper. Startup scripts do not normally have that kind of environmental sensitivity, so having consul allow it violates Least Surprise.

@dpkp
Copy link
Contributor

dpkp commented Mar 23, 2015

I think this needs to be in the init script. You can trigger this bug simply by writing a recipe that does notifies :reload, 'service[consul]' -- i.e., what the README does in sample code. If some other resource has notified a restart then you'll hit this at the end of the chef run when delayed notifications are processed (this happened to us).

Might I suggest checking consul info in a loop instead of sleeping 3 secs? For the upstart template you can do this via:

# Fix for upstream issue. see https://github.com/johnbellone/consul-cookbook/issues/125
post-start script
  echo "Sleep a bit to allow consul time to install signal handlers"
  while ! consul info; do sleep 1; done
end script

[do you mind adding this patch ^^^ for templates/default/consul.conf.erb ? Dont think it makes sense for me to do a separate PR for upstart here.]

@dpkp
Copy link
Contributor

dpkp commented Mar 23, 2015

further testing and it seems like we can't rely on consul info as a signal that startup is done. sleep and pray may be the best answer here for now.

@johnbellone
Copy link
Contributor

I am not a huge fan of this. Is it really what the Consul folks suggest?

@dpkp
Copy link
Contributor

dpkp commented Jun 16, 2015

Is there an alternate fix for this? afaik, hashicorp currently recommends adding sleep to startup command: hashicorp/consul#677 (comment)

@tgwizard
Copy link
Contributor

If we add it, perhaps it should be customizable? Not everyone will have the use-case of starting consul over ssh like this.

@johnbellone johnbellone reopened this Jun 16, 2015
@johnbellone johnbellone added this to the 0.10.2 milestone Jul 10, 2015
@johnbellone
Copy link
Contributor

I am willing to put these changes in if someone wants to submit a PR. I'm going to have one more release prior to merging in #126.

… shutdown when invoked via knife ssh (or any short-lived pty), fixes sous-chefs#125.
@jhmartin
Copy link
Contributor Author

I've updated this PR against current master and made the sleep configurable, but defaults to a safe value. The value can be set to 0 if the environment does not need the extra safety.

@dpkp I'm not familar enough w/upstart -- you can submit a PR against my branch though and I'll include it, or you can submit a separate PR.

johnbellone added a commit that referenced this pull request Jul 13, 2015
Delay return of control during startup
@johnbellone johnbellone merged commit a8d3060 into sous-chefs:master Jul 13, 2015
@lock
Copy link

lock bot commented Apr 25, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sensitivity to HUP during launch
5 participants