Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling /etc/init.d/consul restart sometimes ends up with a stopped … #427

Merged
merged 1 commit into from
May 7, 2017

Conversation

MichaelKueller
Copy link
Contributor

@MichaelKueller MichaelKueller commented Apr 27, 2017

…consul process. The reason is that the init script sometimes starts consul before it was completely stopped. During a cookbook upgrade from version 2.1.3 to 2.2.0 this took down the consul agent on a number of our servers.
This can be easily reproduced runnning:

watch -n 0.5 -e '/etc/init.d/consul restart; sleep 0.1; /etc/init.d/consul status'

Adding the --retry parameter lets start-stop-daemon wait until the consul process is actually stopped.

…consul process. The reason was that it would sometimes start consul before it was completely stopped. This can be easily reproduced runnning:

 watch -n 0.5 -e '/etc/init.d/consul restart; sleep 0.1; /etc/init.d/consul status'

Adding the --retry parameter lets start-stop-daemon wait until the consul process is actually stopped.
@MichaelKueller MichaelKueller changed the title Calling /etc/init.d/consul restart sometimes ended up with a stopped … Calling /etc/init.d/consul restart sometimes ends up with a stopped … Apr 27, 2017
@MichaelKueller
Copy link
Contributor Author

MichaelKueller commented May 2, 2017

The underlying issue has the potential to take down 20 percent of all consul agents when appliying a config change accross an entire environment. When hundreds of servers are involved easily dozens of them will be stopped instead of restarted. Which is why I believe this fix is quite important.

@@ -27,6 +27,9 @@ class ConsulConfig < Chef::Resource
# @!attribute config_dir
# @return [String]
attribute(:config_dir, kind_of: String, default: lazy { node['consul']['service']['config_dir'] })
# @!attribute config_dir_mode
# @return [String]
attribute(:config_dir_mode, kind_of: String, default: '0755')

This comment was marked as outdated.

Copy link
Contributor

@Ginja Ginja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the one suggestion this look good to me. I've actually seen this behaviour on a few of our nodes. Thanks for the PR.

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@f01a12d). Click here to learn what that means.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #427   +/-   ##
=========================================
  Coverage          ?   59.49%           
=========================================
  Files             ?        7           
  Lines             ?      358           
  Branches          ?        0           
=========================================
  Hits              ?      213           
  Misses            ?      145           
  Partials          ?        0
Impacted Files Coverage Δ
libraries/consul_config.rb 77.86% <50%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f01a12d...a723491. Read the comment docs.

@legal90 legal90 merged commit 693a275 into sous-chefs:master May 7, 2017
@lock
Copy link

lock bot commented Apr 25, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants