Skip to content
This repository has been archived by the owner on Dec 31, 2022. It is now read-only.

Refactor Puma restarts #257

Merged
merged 16 commits into from
Oct 13, 2021
Merged

Conversation

olbrich
Copy link
Contributor

@olbrich olbrich commented Mar 8, 2021

This PR modifies the restart process for puma so that it can do a proper restart instead of killing and restarting the app.

@@ -121,6 +121,14 @@ Metrics/AbcSize:
Metrics/BlockLength:
Enabled: false

Metrics/ClassLength:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These cops were forcing people to smash arrays, hashes, and other things into as few lines as possible, which just impaired readability.

Dockerfile Outdated Show resolved Hide resolved
context.template file_path do
mode '0640'
source 'appserver.monitrc.erb'
source "#{opts[:adapter]}.monitrc.erb"
cookbook opts[:source_cookbook].to_s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should let a cookbook override the template if a user needs complete control. The worker driver already does this.

variables opts
notifies :run, 'execute[monit reload]', :immediately
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forces a reload of monit when the file changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between this and setup there's a gotcha that's burned me in production twice now, because a Rails app server should not be started until all deploy tasks are completed; most obviously, asset compilation, but more besides (e.g. I have bespoke secrets injection at deploy time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catch here is that we need monit to reload the configs immediately so that later in the run when we want it to start the server, the configs that make that happen have already been loaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to finesse this is with the onreboot monit option.

If we mark the service in monit as onreboot nostart until all deploy tasks are complete, then the service can be programatically stopped & started when we want, but monit won't auto-start it prematurely.

Once deploy tasks are completed and the service is ready, we'd set it back to the default of onreboot start, so that it still comes up on soft reboot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inopinatus Interesting approach. I'll see if I can get that to work.

What I see right now in my testing is that there are times when monit will attempt to start the rails server before it is properly configured (and the code may not be there either), but it just fails. Once the deploy happens and we trigger the start or restart via monit everything should be in place and work fine.

Is there something custom about your setup that would prevent this approach from working?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olbrich nothing custom. The problem is that monit can attempt to start the server after the code is in place, but before asset compilation has run/completed. Rails will start, but error out as soon as it attempts to consult the asset pipeline.

pidfile = "/var/run/lock/#{app['shortname']}/puma.pid"
context.execute "monit restart #{adapter}_#{app['shortname']}" do
retries 3
only_if { ::File.exist?(pidfile) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only runs if there is a pidfile, which generally means that puma is already running.

@@ -2,7 +2,7 @@

module Drivers
module Webserver
class Base < Drivers::Base # rubocop:disable Metrics/ClassLength
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed.

mode '0640'
source "#{opts[:adapter]}.monitrc.erb"
cookbook opts[:source_cookbook].to_s
variables opts
notifies :run, 'execute[monit reload]', :immediately
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reload monit when this config file changes.

@coveralls
Copy link

coveralls commented Mar 8, 2021

Coverage Status

Coverage decreased (-0.1%) to 99.894% when pulling db9543d on Mckesson-cds:monit-appserver-restarts into 3020734 on ajgon:master.

@@ -37,6 +37,7 @@
fire_hook(:configure, items: databases + [source, framework, appserver, worker, webserver])

execute 'monit reload' do
action :nothing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't reload monit during a config run. If one of the config files change, they will notify this that a reload is needed.

@@ -197,3 +197,9 @@

fire_hook(:setup, items: databases + [source, framework, appserver, worker, webserver])
end

# setup hooks for appservers and workers may need to reload monit configs
execute 'monit reload' do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow for monit reloads during setup.

# setup hooks for appservers and workers may need to reload monit configs
execute 'monit reload' do
action :nothing
only_if 'which monit'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the opsworks agent may install monit for its own purposes, so maybe we can drop the conditional check for monit and maybe just make it a full blown dependency in the metadata file.

@@ -1,6 +1,5 @@
# frozen_string_literal: true

# rubocop:disable Metrics/MethodLength
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed.

@@ -1,6 +1,5 @@
# frozen_string_literal: true

# rubocop:disable Metrics/MethodLength
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed

check process <%= @appserver_name %>_<%= @app_shortname %> with pidfile <%= pid_dir %><%= @appserver_name %>.pid
start program = "/bin/sh -c 'cd <%= File.join(@deploy_to, 'current') %> && <%= @environment.map {|k,v| "#{k}=\"#{v}\""}.join(' ') %> <%= @appserver_command %> | logger -t <%= @appserver_name %>-<%= @app_shortname %>'" as uid "<%= node['deployer']['user'] %>" and gid "<%= node['deployer']['group'] %>" with timeout 90 seconds
stop program = "/bin/sh -c 'cat <%= pid_dir %><%= @appserver_name %>.pid | xargs --no-run-if-empty kill -QUIT; sleep 5'" as uid "<%= node['deployer']['user'] %>" and gid "<%= node['deployer']['group'] %>"
stop restart = "/bin/sh -c 'cat <%= pid_dir %><%= @appserver_name %>.pid | xargs --no-run-if-empty kill -USR2; sleep 5'" as uid "<%= node['deployer']['user'] %>" and gid "<%= node['deployer']['group'] %>"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Actually, I'm not sure this is the right way to restart thin. Find the right way or just remove the 'restart' line, in which case monit will fall back to a stop-start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajgon if you are familiar with Thin, you might have some insight to this one.
https://github.com/macournoyer/thin/blob/0712d603a31d97b9fa8a0260da400da2e4217d60/lib/thin/server.rb#L247 suggests that USR2 is not actually handled.

@inopinatus
Copy link
Contributor

inopinatus commented Mar 8, 2021

I agree that a refactor of these elements is strongly motivated! However, I worry that the monitrc files, like crontabs, get messy really fast, and they're not an obvious place for per-app configuration. Having any per-application configuration in /etc seems inconsistent to me when practically all other per-application configuration appears under /srv.

I suspect that pulling server-specific logic away from monit may produce the most consistent and most maintainable result, and monit should just call a standard init-style interface.

This may also help resolve #255 and #256, which have temporarily obliged me to fork opsworks_ruby and revert 37b9465 for my own production. Basically, I believe monit doesn't really understand foreground processes, and is actually timing out the server start, with severe negative consequences. I think we need to find a way that supports foregound puma, but returns quickly to monit - which to me sounds like the business of an wrapper script.

Some app servers may also need a stop-start (rather than a reload/refork) under limited circumstances, e.g. when environment variables change. There is no way I can see to squeeze that into a monitrc either, it'd be very much an edge case. In any case I'm also in favour of secrets not appearing in log files and ps output.

What this adds up to is a suggestion of a slightly different refactoring (in some ways, a partial reversion to pre-1.20 structures), using a standard monitrc, to invoke a #{deploy_to}/shared/scripts/service wrapper that's templated per-server-type. What do you think?

@inopinatus
Copy link
Contributor

can I separately suggest, teasing the rubocop/style changes into a separate PR.

I know it's not always possible, and for minor deltas hardly matters, but I recommend it for all refactorings of substance, and particularly in Ruby code where the distinction between "style" and "substance" is not always clear cut and we have to build a mental model of the author's preferences and opinions to comprehend the diff.

@olbrich
Copy link
Contributor Author

olbrich commented Mar 12, 2021

TODO: unmonitoring via monit won't work if the the server fails to setup properly in the first place and can throw errors on a shutdown.

@olbrich
Copy link
Contributor Author

olbrich commented Apr 28, 2021

FYI, this seems to be working well for us right now. I'll clean this up and get it reviewed soon.

@olbrich
Copy link
Contributor Author

olbrich commented Jun 8, 2021

@inopinatus We've been using this modification successfully in production for a month or two now. If you haven't already, can you give it a try with your setup and see if you run into any snags or gotchas like the ones you mentioned before?

@stale
Copy link

stale bot commented Aug 7, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 7, 2021
@stale stale bot closed this Aug 14, 2021
@olbrich
Copy link
Contributor Author

olbrich commented Aug 31, 2021

Definitely don't want to forget about this one. Still good and still working well in production.

@ajgon
Copy link
Owner

ajgon commented Sep 2, 2021

@olbrich can I consider it stable enough, to merge it?

@olbrich
Copy link
Contributor Author

olbrich commented Sep 13, 2021

@ajgon I think so, let me take another look through and see if everything looks right.

@olbrich
Copy link
Contributor Author

olbrich commented Oct 8, 2021

@ajgon can you re-open this PR (#257). I'd like to get it finished off and merged.

@ajgon ajgon reopened this Oct 8, 2021
@stale stale bot removed the stale label Oct 8, 2021
@ajgon
Copy link
Owner

ajgon commented Oct 8, 2021

@olbrich Done.

@olbrich
Copy link
Contributor Author

olbrich commented Oct 9, 2021

@ajgon I think I have most of the tests / integration tests working now, but I'm stumped by this failure ... https://github.com/ajgon/opsworks_ruby/runs/3847204234?check_suite_focus=true#step:7:9417. Any ideas?

@ajgon
Copy link
Owner

ajgon commented Oct 9, 2021

@ajgon I think I have most of the tests / integration tests working now, but I'm stumped by this failure ... https://github.com/ajgon/opsworks_ruby/runs/3847204234?check_suite_focus=true#step:7:9417. Any ideas?

This is weird, and it looks like it's completely unrelated to the PR.

This integration test, checks deployment of an app stored as an archive on S3. To do so, it needs AWS access/secret key to actually fetch the app. The key is stored as repository secret, and then used by workflow to download the file. For whatever reason for this PR, those secrets are not populated - not sure if it's an github issue or test itself.

Don't worry about it, consider this test as passing - I'll review the PR tomorrow, if everything is okay, I'll merge it and check the build process again against master.

@ajgon
Copy link
Owner

ajgon commented Oct 9, 2021

I also see, that you introduce new option monit_template_cookbook to appserver and worker - can you please update docs/attributes.md accordingly?

Can you also tell me, if those changes introduce any "breaking changes"?

@olbrich
Copy link
Contributor Author

olbrich commented Oct 12, 2021

@ajgon I don't know of any breaking changes. I'll look into the documentation update.

@olbrich olbrich marked this pull request as ready for review October 12, 2021 21:07
@ajgon ajgon merged commit 035ffb9 into ajgon:master Oct 13, 2021
dotnofoolin pushed a commit to dotnofoolin/opsworks_ruby that referenced this pull request Nov 23, 2021
…estart instead of killing and restarting the app (ajgon#257)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants