-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preemption not working with proportion plugin when queue is full #1772
Comments
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
This issue has not been addressed and still exists. |
/cc @hwdef Can you help for that? |
@Thor-wl |
@Robert-Christensen-visa |
@hwdef I am trying to figure out how to proceed. There is nothing written in the documentation, but I am not sure if it is because the documentation is sparse or because it was intentionally left out. Being able to preempt with the proportion plugin would be a useful feature for me that is not currently working. |
for example:
|
I guess my confusion comes from the overloaded term "preempt". If you are saying preemption means job If you are saying this is intentional and proportion does not terminate running lower-priority jobs under resource constraints (like drf), that is okay. I was thinking this was due to an oversight, not an intentional omission. Thanks this has been helpful! |
@Robert-Christensen-visa |
@hwdef any progress for this issue? |
@william-wang I haven't made much progress here, but under another issue, someone gave a solution, which I haven't tested yet, and I don't know if this solution is universal |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗 |
What happened:
Low-priority jobs will not be preempted by pending high-priority jobs when the proportion plugin is used and the queue is full.
What you expected to happen:
If a high-priority job is submitted to a queue and all resources are used, I would expect volcano to terminate a low-priority running job to make room for the high-priority job to run. If the resources are limited by the capacity of the cluster and high-priority jobs are pending it will preempt a low-priority job. When the resources are limited by the queue capability I expect the same behavior, but do not see it.
How to reproduce it (as minimally and precisely as possible):
volcano-scheduler.conf
I create a single queue and two priority classes:
The queue is configured to be limited to 4CPU and 4G memory. I am running locally on a machine that has 12 CPUs and the queue limit is less than that. To recreate, it is important the queue capability is less than the cluster's total capability.
I submit enough jobs with low-priority to fill the capacity of the queue. After those jobs are running I submit several jobs with high-priority. The high-priority jobs will not preempt the low-priority jobs.
I run this code to submit the jobs to the queue, wait for several seconds, and run jobs with high-priority.
I wait for some time and continue to see low priority jobs and the other jobs pending.
This bug means a high-priority job would not be able to preempt a job with low-priority if the queue is fully utilized.
If resources become available (e.g., the queue capacity increases or
proportion
plugin is disabled), the jobs with high-priority will start before the jobs with low-priority, which means job order is working. However, the expectation with preemption andpriority
is that if a jobs is high-priority it should start running quickly by clearing resources of jobs that are labeled with lower priority.Anything else we need to know?:
A similar issue happens when limiting resources using Kubernetes Resource Quota. When a namespace fully utilizes the resources assigned using the Kuberntes resource quota no new pods will be created. Because no high-priority pods are created preemption does not happen (because preemption happens between a pending pod and running pod). For example, #1014 and #1345 are trying to resolve issues related to this.
The Yunikorn scheduler documentation states they recommend disabling Kubernetes Resource Quota because it causes issues with resource management
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T21:04:39Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.3", GitCommit:"ca643a4d1f7bfe34773c74f79527be4afd95bf39", GitTreeState:"clean", BuildDate:"2021-07-15T20:59:07Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
The text was updated successfully, but these errors were encountered: