-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds job cancellation of flux jobs #22
Adds job cancellation of flux jobs #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass please squash your commits into functional changes, to take down from current 33.
also a nit, some files need EOF clean up
479957e
to
a5ca5a3
Compare
I just added myself as a reviewer. Welcome @xyloid!! Two high level comments first:
Flux's RFC1 on code-development workflow (called C4 which is a variant of fork/pull model) might be useful if you haven't reviewed it yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my posting in the conversation. Thanks.
Thank you for the comments! I will work on comment 1. As for comment 2, I think I mistakenly clicked fetch upstream on my github. Otherwise, I don't know why those commits show up in this branch. Let me try if I can remove them from this branch. |
a5ca5a3
to
29ccee7
Compare
I think those commits disappeared now!! I don't know exactly how this state was created. But I typically use the following sequence before post a PR. In your forked dev branch:
This will allow your changed to be on top of the head of the upstream master. |
29ccee7
to
881626d
Compare
881626d
to
d57bb44
Compare
I used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a short discussion with @cmisale , I think this PR must be spitted out into 2
1- the actual changes to the go code
2- the example for the pi calculation
d57bb44
to
cd1576e
Compare
Previously, all jobspec are written to the same jobspec.yaml. As a result, only the most recent jobspec.yaml is saved. For debug purpose, in this commit, each pod's jobspec is saved in its own file.
A PodInformer is added in kubeflux. For each update pod event, if the pod is in PodSucceeded PodPhase, then kubeflux will try to find its corresponding jobid. If jobid exists, then kubeflux will try to cancel it in flux.
In the update pod event handler, a pod in PodFailed PodPhase will also trigger job cancellation in order to free the resources of a failed pod.
cd1576e
to
14c4edc
Compare
Co-authored-by: Carlos Eduardo Arango Gutierrez <carangog@redhat.com>
This PR adds flux job cancellation function to kubeflux scheduler which enables kubeflux to cancel a flux job after it's corresponding pod completed (Its
PodPhase
can be eitherPodSucceeded
orPodFailed
). Thus the previously allocated resources can be freed and reused.PodInformer
to watch pod event. Job cancellation is implemented in the update event handler.