Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support basic batch job preempt #738

Merged
merged 4 commits into from
Jun 11, 2020

Conversation

carmark
Copy link
Contributor

@carmark carmark commented Mar 12, 2020

fixes: #734

@volcano-sh-bot
Copy link
Contributor

Welcome @carmark!

It looks like this is your first PR to volcano-sh/volcano 馃帀.

Thank you, and welcome to Volcano. 😃

@carmark carmark changed the title support basic batch job preempt [wip]support basic batch job preempt Mar 12, 2020
@volcano-sh-bot volcano-sh-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 12, 2020
@hzxuzhonghu
Copy link
Collaborator

@carmark Can you update?

@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 29, 2020
@carmark carmark changed the title [wip]support basic batch job preempt support basic batch job preempt May 29, 2020
@volcano-sh-bot volcano-sh-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2020
@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: a22d43b0-a181-11ea-b485-534d9f3e4954

@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 501f0ad0-a187-11ea-b485-534d9f3e4954

@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 58989290-a960-11ea-9161-4f77e253115e

@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 04fd6a60-a961-11ea-9161-4f77e253115e

@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 590df760-a965-11ea-9161-4f77e253115e

@k82cn
Copy link
Member

k82cn commented Jun 8, 2020

@carmark do you have time to make CI happy?

@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 10, 2020
@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: 9b294e80-aacb-11ea-9064-4d6590cbb359

@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: f0e4f7d0-aaed-11ea-b056-2f6ef3b381c7

Signed-off-by: Lei Xue <vfs@live.com>
Signed-off-by: Lei Xue <vfs@live.com>
Signed-off-by: Lei Xue <vfs@live.com>
Signed-off-by: Lei Xue <vfs@live.com>
@TravisBuddy
Copy link

Travis tests have failed

Hey @carmark,
Please read the following log in order to understand the failure reason.
It'll be awesome if you fix what's wrong and commit the changes.

TravisBuddy Request Identifier: ddc16cb0-aaf2-11ea-b056-2f6ef3b381c7

@carmark carmark closed this Jun 10, 2020
@carmark carmark reopened this Jun 10, 2020
@carmark
Copy link
Contributor Author

carmark commented Jun 10, 2020

@k82cn @hzxuzhonghu
Finally, the travis passed.

@hzxuzhonghu
Copy link
Collaborator

cool

@k82cn
Copy link
Member

k82cn commented Jun 11, 2020

/lgtm
/approve

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2020
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carmark, k82cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2020
@volcano-sh-bot volcano-sh-bot merged commit 4f9ce26 into volcano-sh:master Jun 11, 2020
@@ -247,6 +247,7 @@ func (alloc *Action) Execute(ssn *framework.Session) {
stmt.Commit()
} else {
stmt.Discard()
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question: if break here, then it will not try to schedule other jobs in the same period?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

klog.V(1).Infof("%+v", pod.Status.Conditions)
return nil
}
if _, err := de.kubeclient.CoreV1().Pods(p.Namespace).UpdateStatus(pod); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to update just before delete?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give some clients a hint that why this pod is evicted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the operator may find this changes and create another pod for the job.

occupid := jobOccupidMap[job.UID]
preemptable := job.MinAvailable <= occupid-1 || job.MinAvailable == 1

preemptable := pJob.Priority > job.Priority
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the preemptFn here now is same as priority's

@wpeng102
Copy link
Member

wpeng102 commented Dec 1, 2020

@carmark
It seems this fix can not work with proportion + reclaim. The ReclaimableFn in gang is base on job priority now. If not set priority for jobs and submit job1 into queue1, job2 into queue2. Job2 in queue2 can not reclaim any resource from queue1, because of job1 and job2 priority are equal, no victims from Gang plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support preempt for batch job
7 participants