-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue 2262: add priority capability for reclaim action #3340
base: master
Are you sure you want to change the base?
Conversation
e468089
to
0590c26
Compare
/assign @Monokaix @william-wang |
If other queue's jobs all have higher priority,current queue can not reclaim their resources, reclaim will not happen, is this reasonable? |
We just support this feature and close it by default。 How to use it is depended on the cluster admin. One solutions is to limit the higher priority jobs used resource not exceed queue's deserved in application layer. |
But this seems places high demands on administrators and limits the job priority of the queue, we'd better add a desige doc and give some user-guide. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Monokaix docs is added. |
/close |
It conflicts with the existing allocation logic between queues. When we allocate tasks, we do not consider the priority of jobs between queues, but only consider the priority at the queue level. |
I think there may be a bug in Reclaimable in session_plugins.go, for this configmap tiers:
- plugins:
- name: priority
- name: proportion as reclaim enabled in priority plugin, let us image that theres some victims return from priority, but if no victims return from proportion plugin , Reclaimable function will return no victims, and reclaim not work any more, put proportion in front of priority can fix this, but it may be a bug? @hwdef |
That is a problem of your config. You'd better to put plugins about resource in a same tier, but a tier different from gang/priority, eg: tiers:
- plugins:
- name: priority
enableReclaimable: false
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack |
ok, make sense |
@lowang-bh , there is another promble as following: actions: "enqueue,allocate,backfill,preempt,reclaim"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
enableJobStarving: false
enableReclaimable: false
enabledQueueScoreOrder: false
- name: conformance
- plugins:
- name: predicates
- name: proportion gang enablePreemptable: false( make it true also don't work too), so it's cant do any preempt/reclaim, when a high priority job comes, if there no enough resource to meet gang constraint, I hope get resources by reclaim, but as not meet gang constraint, this job will pending, and reclaim action is just skip pending job, so reclaim not happend, no reclaim, no release resource to get job running, like deadlock, delete enqueue may work, but in my scene,enqueue is a must, and |
@zhoushuke There are two kind of evicts:
If you have any problems about how to use volcano, please file a issue to describe it. |
I have some volcano practice and use it in a production to support about 30k~50k pods a day. |
So can you explain why set enablePreemptable: false of gang? |
…y-plugin note: set priority plugin conf: enableReclaimable default to false Signed-off-by: lowang-bh <lhui_wang@163.com> Signed-off-by: lowang-bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
what about reclaim lower priority first, if no enough resourecs reclaimed then reclaim higher priority task? |
If you didn't use enqueue action, the default status will be pending and allocate action will change it to enqueue when there is no enqueue action configured. So you should put reclaim action after allocate. |
We should consider higher priority jobs in other queues can never be reclaimed if enabledReclaimable is enabled in Priority Plugin. The original design intention of reclaim should be to be able to reclaim jobs in other queues, without considering the priority, right? |
Yes,docs has highlight the point that should be carefully to open it. |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
still need |
Fixes #2262
reclaimable
switch and add UT about priority pluginNote:
reclaimableFn
usually is used inreclaim
action to reclaim a queue's deserved resource when cluster has not enough resource to allocate new coming tasks in this queue. So please be carefully to setenableReclaimable
totrue
inpriority
plugin, in case that a queue's resource owned by high priority jobs can not be released. AndenableReclaimable
is disabled by default for compatibility