Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-wreck cancel: fall back to kill -9 if job is not pending #1385

Merged
merged 3 commits into from
Mar 24, 2018

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Mar 23, 2018

This addresses issue #1379 with a couple minor changes to flux wreck cancel. When sched is loaded and it gets EINVAL error, cancel will fall back to equivalent of flux wreck kill -9 on the job.

I also added a -f, --force option which allows flux wreck cancel to work even if sched is not loaded, mainly so the code could be tested in t2000-wreck.t.

elseif err == "Invalid argument" then
prog:log ("Sending SIGKILL to %d\n", id)
local rc, err = f:sendevent ({signal = 9}, "wreck.%d.kill", id)
if not rc then self:die ("signal: %s\n", err) end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

If sched.cancel rpc will respond with EINVAL if the state of the target job is essentially not pending (including running and completed), hopefully there is no side effect for kill(-9) rpc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was lazy and do not check the real state of the job here. The wreck.N.kill event is ignored if the job isn't running.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Great. LGTM otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @dongahn!

@grondo
Copy link
Contributor Author

grondo commented Mar 23, 2018

Example:

$ flux submit -n 32 sleep 100
submit: Submitted jobid 1
$ flux submit -n 32 sleep 100
submit: Submitted jobid 2
 flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1     32 running    2018-03-23T14:56:24       3.490s    [0-1] sleep
     2     32 submitted  2018-03-23T14:56:27       0.000s      nil sleep
$ flux wreck cancel 2
$ flux wreck cancel 1
wreck: Sending SIGKILL to 1
$ flux wreck status 1
Job 1 status: complete
task[0-31]: exited with signal 9
$ flux wreck status 2
Job 2 status: cancelled

@coveralls
Copy link

coveralls commented Mar 23, 2018

Coverage Status

Coverage decreased (-0.04%) to 78.781% when pulling 69c228b on grondo:issue#1379 into 4223879 on flux-framework:master.

@codecov-io
Copy link

Codecov Report

Merging #1385 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1385      +/-   ##
==========================================
- Coverage   78.49%   78.49%   -0.01%     
==========================================
  Files         162      162              
  Lines       29741    29741              
==========================================
- Hits        23345    23344       -1     
- Misses       6396     6397       +1
Impacted Files Coverage Δ
src/common/libflux/keepalive.c 86.66% <0%> (-6.67%) ⬇️
src/common/libflux/request.c 87.17% <0%> (-1.29%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/broker/overlay.c 74.14% <0%> (-0.32%) ⬇️
src/common/libflux/message.c 81.36% <0%> (+0.11%) ⬆️
src/common/libutil/dirwalk.c 94.28% <0%> (+0.71%) ⬆️
src/common/libflux/rpc.c 94.21% <0%> (+0.82%) ⬆️

grondo added 3 commits March 23, 2018 16:13
Fall back to kill -9 for running jobs in flux-wreck cancel.

Fixes issue flux-framework#1379
Add -f, --force option to flux-wreck cancel to force kill with SIGKILL
running jobs even when sched is not loaded.
Test --force option of flux-wreck cancel.
@garlick
Copy link
Member

garlick commented Mar 23, 2018

LGTM. Will merge after travis finishes.

@garlick garlick merged commit 7fd643e into flux-framework:master Mar 24, 2018
@grondo grondo deleted the issue#1379 branch April 26, 2018 16:24
@grondo grondo mentioned this pull request May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants