Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SIGTERM in run_single_worker.rb #15818

Merged
merged 2 commits into from
Aug 16, 2017

Conversation

NickLaMuro
Copy link
Member

Termination of workers via the MiqServer::WorkerManager currently happens by sending a exit message over DRB to the worker that it will process in between doing jobs (so nothing is halfway in flight). The MiqServer will wait for a specific amount of time (timeout defined in the settings, but defaults to 5.minutes or so usually) before killing it (at this point, it is assumed the worker is stuck).

Currently, SIGTERMS are trapped and handled by the worker runner class to run the do_exit method, but they don't allow the work currently in progress to finish. The addition of the #setup_sigterm_trap method will introduce Kernel traps that properly inform the worker that once they have finished with their next round of work, they should exit.

Steps for Testing/QA

  • Apply the following diff (needed becuase the VimBroker won't be present, and adds some handy debugging output):

    diff --git a/app/models/miq_generic_worker/runner.rb b/app/models/miq_generic_worker/runner.rb
    index 022db94..15c0175 100644
    --- a/app/models/miq_generic_worker/runner.rb
    +++ b/app/models/miq_generic_worker/runner.rb
    @@ -1,3 +1,3 @@
     class MiqGenericWorker::Runner < MiqQueueWorkerBase::Runner
    -  self.delay_startup_for_vim_broker = true # NOTE: For ems_operations and smartstate roles
    +  self.delay_startup_for_vim_broker = false # NOTE: For ems_operations and smartstate roles
     end
    diff --git a/app/models/miq_queue_worker_base/runner.rb b/app/models/miq_queue_worker_base/runner.rb
    index 206066b..0179d9d 100644
    --- a/app/models/miq_queue_worker_base/runner.rb
    +++ b/app/models/miq_queue_worker_base/runner.rb
    @@ -96,6 +96,7 @@ class MiqQueueWorkerBase::Runner < MiqWorker::Runner
       end
    
       def deliver_queue_message(msg)
    +    print "delivering queue message #{msg.class_name}.#{msg.method_name}..."
         reset_poll_escalate if poll_method == :sleep_poll_escalate
    
         begin
    @@ -128,6 +129,7 @@ class MiqQueueWorkerBase::Runner < MiqWorker::Runner
           #
           clean_broker_connection
         end
    +    puts "done!"
       end
    
       def deliver_message(msg)
  • Create some long running work for your worker to act on (the time it sleeps in :args can be adjusted as needed):

    $ bin/rails runner 'MiqQueue.put :class_name => "Kernel", :method_name => "sleep", :args => 200, :zone => MiqServer.my_server.zone.name, :queue_name => "generic"'
  • Start a worker using the run_single_worker.rb script:

    $ ruby lib/workers/bin/run_single_worker.rb MiqGenericWorker
  • Wait for the worker to start processing the job, then interrupt it (Crtl-C).

The worker should output done! before exiting.

Termination of workers via the MiqServer::WorkerManager currently
happens by sending a exit message over DRB to the worker that it will
process in between doing jobs (so nothing is halfway in flight).  The
MiqServer will wait for a specific amount of time (timeout defined in
the settings, but defaults to 5.minutes or so usually) before killing it
(at this point, it is assumed the worker is stuck).

Currently, SIGTERMS are trapped and handled by the worker runner class
to run the `do_exit` method, but they don't allow the work currently in
progress to finish.  The addition of the `#setup_sigterm_trap` method
will introduce Kernel traps that properly inform the worker that once
they have finished with their next round of work, they should exit.
@NickLaMuro
Copy link
Member Author

@miq-bot assign @gtanzillo

Instead of the run_single_worker.rb script calling `.start_worker` on
the Runner class, have it instantiate a new class, call
`#setup_sigterm_trap` on that instance, and then call start (effectively
`start_worker`, just adds the `#setup_sigterm_trap` call in the middle)
@miq-bot
Copy link
Member

miq-bot commented Aug 16, 2017

Checked commits NickLaMuro/manageiq@514af2a~...7776e35 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 0 offenses detected
Everything looks fine. ⭐

@jrafanie
Copy link
Member

LGTM, merging. @gtanzillo you might want to take a look.

@jrafanie jrafanie merged commit 7b9544c into ManageIQ:master Aug 16, 2017
@jrafanie jrafanie added this to the Sprint 67 Ending Aug 21, 2017 milestone Aug 16, 2017
@@ -341,6 +341,10 @@ def do_work_loop
@backoff = nil
end

# Should be caught by the rescue in `#start` and will run do_exit from
# there.
raise Interrupt if @sigterm_recieved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, received... I noticed the spelling mistake after merging 😢

Does it make sense to also initialize @sigterm_received = false somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrafanie good catch on both of those. Accessing a instance var will cause warnings when in verbose mode (so these probably exist in in the test output from travis), so yeah, those should be fixed.

Will open up a follow up PR shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YOLO MERGE FTW

jrafanie added a commit that referenced this pull request Aug 17, 2017
…gterm_bugfix

Fix minor typos/issues with PR #15818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants