CC-1206: Add a check for zombie DJ workers #251
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of your patch
Properly recover when a DelayedJob worker was terminated and is lingering as a zombie process.
There is a bug here:
ey-cookbooks-stable-v5/cookbooks/delayed_job4/templates/default/dj.erb
Lines 51 to 62 in 7310e1b
The test
tests if $LAST_LOCK_PID is defined and there’s no running process with that pid, but there is a lock file. It goes to the "Monit already messing with..." block if there is a running process, even if it's a zombie process.
This PR adds an additional test to check if the PID matches a running process but the process is a zombie.
Recommended Release Notes
Updates the delayed_job4 recipe to properly handle zombie workers
Estimated risk
Low
Components involved
DelayedJob custom chef recipe
Description of testing done
See QA instructions
QA Instructions
NOTE: These are the same as the QA instructions for PR #224. This PR can be tested with #224.
Test on configuration A_dj
Configuration A
rails_activejob_example (delayed_job branch) App
Unicorn
Ruby 2.3
RubyGems 2.6.5
Postgres 9.5
US East Virginia
Solo
Boot the test environment under the QA stack
Enable the delayed_job recipe.
Modify the recipe to install DelayedJob on the solo instance
Modify the recipe and set a very low worker memory limit (e.g. low enough to always trigger the memory limit even with zero workload, e.g. 10MB)
Run chef
Observe the delayed_job processes by running
ps -ef | grep elay
Make sure: