Forking worker unable to gracefully exit #202

tpickett66 · 2014-08-10T15:45:53Z

I'm getting the following backtrace from a Qless::Worker::ForkingWorker when sending SIGQUIT when using Ruby 2.1.

<gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'synchronize': can't be called from trap context (ThreadError)
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'shutdown_sandboxes'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:131:in 'stop!'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:72:in 'block in register_signal_handlers'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'call'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'wait2'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'block in run'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'loop'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'run'

In versions of Ruby prior to 2.0 the call that caused this was allowed but is considered unsafe due to a potential deadlock (see: Can't write to a Logger in a signal handler). A common way to handle this is to have a signal queue that the run loop reads from. This will require not blocking when checking the status of children so the run loop can perform both the signal queue flush as well as child process house keeping, there are several good examples of this in ruby (Unicorn and Foreman come to mind). I intend to get started on reworking the forking worker's signal handlers, and will likely end up fixing #161 as well.

As part of this I'll be using IO.pipe and a couple of other pieces of functionality that have changed in the last couple versions of Ruby, what versions should I target with these changes? Also, are there any pitfalls of the code I should know about before embarking on this journey?

The text was updated successfully, but these errors were encountered:

esfourteen-zz · 2014-09-24T04:08:54Z

We're experiencing the same issue when running qless through upstart. Stopping the parent process leaves orphaned workers lingering.

evanbattaglia · 2015-01-29T18:49:48Z

Hey @tpickett66 any update on this? I'm running into the same issue.

Ruby 2 complains when using Mutex#synchronize within in a trap context. This is because, if the program flow is interrupted by a signal while the program is in a synchronize block, and synchronize is called from the trap context, syncrhonize will never return because it will wait for the same process/thread to release the block. Here, in a trap context, if we fail to get the lock, I add the signal to a signal queue which will be run soon after the synchronize block in the normal program flow is finished. If we can get the lock, we should shutdown / process the signal immediately, because main program flow is probably just waiting for the child process and so can not timely check the signal queue. Addresses #202

evanbattaglia mentioned this issue Feb 11, 2015

Fix signal handling for Ruby 2 -- shutdown child processes #209

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forking worker unable to gracefully exit #202

Forking worker unable to gracefully exit #202

tpickett66 commented Aug 10, 2014

esfourteen-zz commented Sep 24, 2014

evanbattaglia commented Jan 29, 2015

Forking worker unable to gracefully exit #202

Forking worker unable to gracefully exit #202

Comments

tpickett66 commented Aug 10, 2014

esfourteen-zz commented Sep 24, 2014

evanbattaglia commented Jan 29, 2015