Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forking worker unable to gracefully exit #202

Open
tpickett66 opened this issue Aug 10, 2014 · 2 comments
Open

Forking worker unable to gracefully exit #202

tpickett66 opened this issue Aug 10, 2014 · 2 comments

Comments

@tpickett66
Copy link
Contributor

I'm getting the following backtrace from a Qless::Worker::ForkingWorker when sending SIGQUIT when using Ruby 2.1.

<gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'synchronize': can't be called from trap context (ThreadError)
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:159:in 'shutdown_sandboxes'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:131:in 'stop!'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:72:in 'block in register_signal_handlers'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'call'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'wait2'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:94:in 'block in run'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'loop'
from <gem dir>/qless-1647ad28fa48/lib/qless/worker/forking.rb:88:in 'run'

In versions of Ruby prior to 2.0 the call that caused this was allowed but is considered unsafe due to a potential deadlock (see: Can't write to a Logger in a signal handler). A common way to handle this is to have a signal queue that the run loop reads from. This will require not blocking when checking the status of children so the run loop can perform both the signal queue flush as well as child process house keeping, there are several good examples of this in ruby (Unicorn and Foreman come to mind). I intend to get started on reworking the forking worker's signal handlers, and will likely end up fixing #161 as well.

As part of this I'll be using IO.pipe and a couple of other pieces of functionality that have changed in the last couple versions of Ruby, what versions should I target with these changes? Also, are there any pitfalls of the code I should know about before embarking on this journey?

@esfourteen-zz
Copy link

We're experiencing the same issue when running qless through upstart. Stopping the parent process leaves orphaned workers lingering.

@evanbattaglia
Copy link

Hey @tpickett66 any update on this? I'm running into the same issue.

evanbattaglia pushed a commit that referenced this issue Feb 11, 2015
Ruby 2 complains when using Mutex#synchronize within in a trap context.
This is because, if the program flow is interrupted by a signal while
the program is in a synchronize block, and synchronize is called from
the trap context, syncrhonize will never return because it will wait for
the same process/thread to release the block.

Here, in a trap context, if we fail to get the lock, I add the signal to a signal queue
which will be run soon after the synchronize block in the normal program
flow is finished. If we can get the lock, we should shutdown / process
the signal immediately, because main program flow is probably just
waiting for the child process and so can not timely check the signal
queue.

Addresses #202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants