Preemption not working properly for high priority job #2034

vincentlau0493 · 2022-02-22T08:52:52Z

What happened:
Low-priority running jobs will not be preempted by pending high-priority jobs when resource is not enough.

What you expected to happen:
Low-priority running job should be evicted and then high priority job starts running.

How to reproduce it (as minimally and precisely as possible):
volcano-scheduler.conf

{
	"volcano-scheduler.conf": "actions: \"enqueue, allocate, backfill, preempt\"
		tiers:
		- plugins:
		  - name: priority
		  - name: gang
		  - name: conformance
		- plugins:
		  - name: drf
		  - name: predicates
		  - name: proportion
		  - name: nodeorder
		  - name: binpack
		"
}

I created two priority class:

$kubectl get priorityClass -o wide
NAME                      VALUE        GLOBAL-DEFAULT   AGE
high-priority             1000000      false            4d5h
low-priority              10000        false            4d5h
system-cluster-critical   2000000000   false            168d
system-node-critical      2000001000   false            168d

and two jobs with different priority using default queue:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vc-low-job
  namespace: preempt
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: default
  priorityClassName: low-priority
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: nginx
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          priorityClassName: low-priority
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 2
                limits:
                  cpu: 2
          restartPolicy: OnFailure

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vc-high-job
  namespace: preempt
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: default
  priorityClassName: high-priority
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: nginx
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          priorityClassName: high-priority
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 2
                limits:
                  cpu: 2
          restartPolicy: OnFailure

I ran the case on Mac minikube, cpu should be more than 4.

I started with low priority job, the job was running properly. When I created high priority job, the phase of podgroup was stuck at InQueue:

$kubectl describe pg vc-high-job -n preempt
Name:         vc-high-job
Namespace:    preempt
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.volcano.sh/v1beta1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2022-02-22T08:19:20Z
  Generation:          11
  Managed Fields:
    API Version:  scheduling.volcano.sh/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:ownerReferences:
          .:
          k:{"uid":"a5b46c10-4f69-465c-a48a-c9f597992e2f"}:
      f:spec:
        .:
        f:minMember:
        f:minResources:
          .:
          f:cpu:
        f:minTaskMember:
          .:
          f:nginx:
        f:priorityClassName:
        f:queue:
      f:status:
    Manager:      vc-controller-manager
    Operation:    Update
    Time:         2022-02-22T08:19:20Z
    API Version:  scheduling.volcano.sh/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:phase:
    Manager:    vc-scheduler
    Operation:  Update
    Time:       2022-02-22T08:19:21Z
  Owner References:
    API Version:           batch.volcano.sh/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Job
    Name:                  vc-high-job
    UID:                   a5b46c10-4f69-465c-a48a-c9f597992e2f
  Resource Version:        76666
  UID:                     c3f3d5e1-16eb-4197-bd1d-d81caf01d879
Spec:
  Min Member:  1
  Min Resources:
    Cpu:  2
  Min Task Member:
    Nginx:              1
  Priority Class Name:  high-priority
  Queue:                default
Status:
  Conditions:
    Last Transition Time:  2022-02-22T08:28:18Z
    Message:               1/1 tasks in gang unschedulable: pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable
    Reason:                NotEnoughResources
    Status:                True
    Transition ID:         b64ec226-264c-447a-aca9-f61995efc277
    Type:                  Unschedulable
  Phase:                   Inqueue
Events:
  Type     Reason         Age                      From     Message
  ----     ------         ----                     ----     -------
  Warning  Unschedulable  9m21s                    volcano  0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
  Warning  Unschedulable  4m21s (x299 over 9m20s)  volcano  1/1 tasks in gang unschedulable: pod group is not ready, 1 Pending, 1 minAvailable; Pending: 1 Unschedulable

And the pod was pending:

k8s-pratice kubectl get pod -n preempt
NAME                  READY   STATUS    RESTARTS   AGE
vc-high-job-nginx-0   0/1     Pending   0          89s
vc-low-job-nginx-0    1/1     Running   0          2m47s

Here is some logs on scheduler:

I0222 08:21:26.092579       1 session.go:168] Open Session b9ee403d-034c-4f5e-861c-fa2dc99462dc with <2> Job and <2> Queues
I0222 08:21:26.093523       1 enqueue.go:44] Enter Enqueue ...
I0222 08:21:26.094134       1 enqueue.go:78] Try to enqueue PodGroup to 0 Queues
I0222 08:21:26.094277       1 enqueue.go:103] Leaving Enqueue ...
I0222 08:21:26.094330       1 allocate.go:43] Enter Allocate ...
I0222 08:21:26.094363       1 allocate.go:96] Try to allocate resource to 1 Namespaces
I0222 08:21:26.094486       1 allocate.go:163] Try to allocate resource to Jobs in Namespace <preempt> Queue <default>
I0222 08:21:26.094622       1 allocate.go:197] Try to allocate resource to 1 tasks of Job <preempt/vc-high-job>
I0222 08:21:26.094853       1 proportion.go:299] Queue <default>: deserved <cpu 4000.00, memory 0.00>, allocated <cpu 2000.00, memory 0.00>, share <0.5>, underUsedResName [cpu]
I0222 08:21:26.094961       1 allocate.go:212] There are <1> nodes for Job <preempt/vc-high-job>
I0222 08:21:26.095204       1 predicate_helper.go:73] Predicates failed for task <preempt/vc-high-job-nginx-0> on node <minikube>: task preempt/vc-high-job-nginx-0 on node minikube fit failed: node(s) resource fit failed
I0222 08:21:26.095376       1 statement.go:354] Discarding operations ...
I0222 08:21:26.095400       1 allocate.go:163] Try to allocate resource to Jobs in Namespace <preempt> Queue <default>
I0222 08:21:26.095664       1 allocate.go:197] Try to allocate resource to 0 tasks of Job <preempt/vc-low-job>
I0222 08:21:26.095735       1 statement.go:380] Committing operations ...
I0222 08:21:26.095954       1 allocate.go:159] Namespace <preempt> have no queue, skip it
I0222 08:21:26.096006       1 allocate.go:283] Leaving Allocate ...
I0222 08:21:26.096197       1 backfill.go:40] Enter Backfill ...
I0222 08:21:26.096577       1 backfill.go:90] Leaving Backfill ...
I0222 08:21:26.096654       1 preempt.go:41] Enter Preempt ...
I0222 08:21:26.096805       1 preempt.go:63] Added Queue <default> for Job <preempt/vc-high-job>
I0222 08:21:26.097128       1 statement.go:380] Committing operations ...
I0222 08:21:26.097145       1 preempt.go:189] Leaving Preempt ...
I0222 08:21:26.098257       1 session.go:190] Close Session b9ee403d-034c-4f5e-861c-fa2dc99462dc

Anything else we need to know?:

Environment:

Volcano Version: 1.5.0(latest)
Kubernetes version (use kubectl version): Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: minikube on macbook
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a): Darwin macdeMacBook-Pro.local 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

william-wang · 2022-02-22T09:05:12Z

@vincentlau0493 Thanks for your reporting. We have added it in pipeline and take a look as soon as possible.

hwdef · 2022-02-22T09:58:55Z

#1772
may be the same problem

vincentlau0493 · 2022-02-22T10:59:09Z

#1772 may be the same problem

Yes, I read that issue, it seems like the problem hasn't been solved, right? I expect the running job would be evicted when high priority job needs resource, but it is not working as my expection. Did I configure anything wrong?

hwdef · 2022-02-22T11:01:07Z

No, this issue has not been resolved. Just had some discussions, no conclusion yet

vincentlau0493 · 2022-02-22T11:06:03Z

I see. Since my team is going to use this feature, is there any trick to achieve it? I know the low priority pod can be preempted by higher one using native k8s.

hwdef · 2022-02-23T01:59:56Z

This issue is still under investigation. If there is any new progress, I will reply again. If you have any information that you want to synchronize, please share with us.😀

Sharathmk99 · 2022-02-23T12:48:55Z

Very much interested in this feature. Premeeting low priority jobs when high priority jobs comes in

vincentlau0493 · 2022-02-24T03:11:50Z

Try this conf:

{
	"volcano-scheduler.conf": "actions: \"enqueue, allocate, backfill, preempt\"
		tiers:
		- plugins:
		  - name: priority
		  - name: gang
		    enablePreemptable: false
		    enableJobStarving: false
		  - name: conformance
		- plugins:
		  - name: overcommit
		  - name: drf
		  - name: predicates
		  - name: proportion
		  - name: nodeorder
		  - name: binpack
		"
}

It works for me now.

Sharathmk99 · 2022-03-02T14:00:10Z

@vincentlau0493 It's not working for me.
Below is my configuration

volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill, preempt"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enablePreemptable: false
        enableJobStarving: false
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

Below are low priority jobs,

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vc-job1
  namespace: kubeflow-user-sharathmandya-krishna
spec:
  # minAvailable: 0
  priorityClassName: normal-priority
  schedulerName: volcano
  queue: test
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: job1
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          priorityClassName: normal-priority
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 2
                limits:
                  cpu: 2
          restartPolicy: OnFailure
          nodeSelector:
            dedicated: volcano
          tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "volcano"
            effect: "NoSchedule"

---

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vc-job2
  namespace: kubeflow-user-sharathmandya-krishna
spec:
  # minAvailable: 0
  priorityClassName: normal-priority
  schedulerName: volcano
  queue: test
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: job2
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          priorityClassName: normal-priority
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 4
                limits:
                  cpu: 4
          restartPolicy: OnFailure
          nodeSelector:
            dedicated: volcano
          tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "volcano"
            effect: "NoSchedule"

Below is high priority job,

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vc-job3
  namespace: kubeflow-user-sharathmandya-krishna
spec:
  # minAvailable: 0
  schedulerName: volcano
  queue: test
  priorityClassName: high-priority
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: job3
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          priorityClassName: high-priority
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 3
                limits:
                  cpu: 3
          restartPolicy: OnFailure
          nodeSelector:
            dedicated: volcano
          tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "volcano"
            effect: "NoSchedule"

Pending pogroup details,

Name:         vc-job3
Namespace:    kubeflow-user-sharathmandya-krishna
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.volcano.sh/v1beta1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2022-03-02T13:58:56Z
  Generation:          2
  Managed Fields:
    API Version:  scheduling.volcano.sh/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
        f:ownerReferences:
      f:spec:
        .:
        f:minMember:
        f:minResources:
          .:
          f:cpu:
        f:minTaskMember:
          .:
          f:job3:
        f:priorityClassName:
        f:queue:
      f:status:
    Manager:      vc-controller-manager
    Operation:    Update
    Time:         2022-03-02T13:58:56Z
    API Version:  scheduling.volcano.sh/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:phase:
    Manager:    vc-scheduler
    Operation:  Update
    Time:       2022-03-02T13:58:57Z
  Owner References:
    API Version:           batch.volcano.sh/v1alpha1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Job
    Name:                  vc-job3
    UID:                   0b381bca-a2df-43a5-bbd1-8b416f81e7a9
  Resource Version:        559225577
  Self Link:               /apis/scheduling.volcano.sh/v1beta1/namespaces/kubeflow-user-sharathmandya-krishna/podgroups/vc-job3
  UID:                     56e94dd0-caa6-4e99-a924-d77931bdc75f
Spec:
  Min Member:  1
  Min Resources:
    Cpu:  3
  Min Task Member:
    job3:               1
  Priority Class Name:  high-priority
  Queue:                test
Status:
  Conditions:
    Last Transition Time:  2022-03-02T13:58:57Z
    Message:               1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable
    Reason:                NotEnoughResources
    Status:                True
    Transition ID:         faeecd6b-bfa5-4d02-9309-c99da921a227
    Type:                  Unschedulable
  Phase:                   Pending
Events:
  Type     Reason         Age                From     Message
  ----     ------         ----               ----     -------
  Warning  Unschedulable  2s (x25 over 26s)  volcano  0/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable

Logs,

I0302 14:00:27.189952       1 scheduler.go:93] Start scheduling ...
I0302 14:00:27.190035       1 node_info.go:277] set the node uc-wrn-stg-dok8swork-04 status to Ready.
I0302 14:00:27.190132       1 cache.go:975] The priority of job <kubeflow-user-sharathmandya-krishna/vc-job1> is <normal-priority/1>
I0302 14:00:27.190164       1 cache.go:975] The priority of job <kubeflow-user-sharathmandya-krishna/vc-job2> is <normal-priority/1>
I0302 14:00:27.190182       1 cache.go:975] The priority of job <kubeflow-user-sharathmandya-krishna/vc-job3> is <high-priority/1000000>
I0302 14:00:27.190226       1 cache.go:1013] There are <3> Jobs, <2> Queues and <1> Nodes in total for scheduling.
I0302 14:00:27.190240       1 session.go:170] Open Session 6f880f6f-5f8f-476d-b24f-af2572f57a12 with <3> Job and <2> Queues
I0302 14:00:27.190275       1 overcommit.go:72] Enter overcommit plugin ...
I0302 14:00:27.190283       1 overcommit.go:127] Leaving overcommit plugin.
I0302 14:00:27.190292       1 drf.go:204] Total Allocatable cpu 8000.00, memory 67448836096.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00
I0302 14:00:27.190572       1 proportion.go:80] The total resource is <cpu 8000.00, memory 67448836096.00, hugepages-2Mi 0.00, hugepages-1Gi 0.00>
I0302 14:00:27.190587       1 proportion.go:88] The total guarantee resource is <cpu 0.00, memory 0.00>
I0302 14:00:27.190594       1 proportion.go:91] Considering Job <kubeflow-user-sharathmandya-krishna/vc-job3>.
I0302 14:00:27.190601       1 proportion.go:124] Added Queue <test> attributes.
I0302 14:00:27.190607       1 proportion.go:91] Considering Job <kubeflow-user-sharathmandya-krishna/vc-job1>.
I0302 14:00:27.190611       1 proportion.go:91] Considering Job <kubeflow-user-sharathmandya-krishna/vc-job2>.
I0302 14:00:27.190628       1 proportion.go:182] Considering Queue <test>: weight <7>, total weight <7>.
I0302 14:00:27.190642       1 proportion.go:196] Format queue <test> deserved resource to <cpu 6000.00, memory 0.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>
I0302 14:00:27.190650       1 proportion.go:200] queue <test> is meet
I0302 14:00:27.190670       1 proportion.go:208] The attributes of queue <test> in proportion: deserved <cpu 6000.00, memory 0.00>, realCapability <cpu 8000.00, memory 67448836096.00, hugepages-2Mi 0.00, hugepages-1Gi 0.00>, allocate <cpu 6000.00, memory 0.00>, request <cpu 6000.00, memory 0.00>, share <1.00>
I0302 14:00:27.190691       1 proportion.go:220] Remaining resource is  <cpu 2000.00, memory 67448836096.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>
I0302 14:00:27.190717       1 proportion.go:171] Exiting when total weight is 0
I0302 14:00:27.190828       1 binpack.go:158] Enter binpack plugin ...
I0302 14:00:27.190837       1 binpack.go:177] resources [] record in weight but not found on any node
I0302 14:00:27.190844       1 binpack.go:161] Leaving binpack plugin. binpack.weight[1], binpack.cpu[1], binpack.memory[1], no extend resources. ...
I0302 14:00:27.190850       1 enqueue.go:44] Enter Enqueue ...
I0302 14:00:27.190856       1 enqueue.go:62] Added Queue <test> for Job <kubeflow-user-sharathmandya-krishna/vc-job1>
I0302 14:00:27.190862       1 enqueue.go:73] Added Job <kubeflow-user-sharathmandya-krishna/vc-job3> into Queue <test>
I0302 14:00:27.190868       1 enqueue.go:78] Try to enqueue PodGroup to 1 Queues
I0302 14:00:27.190876       1 overcommit.go:114] Resource in cluster is overused, reject job <kubeflow-user-sharathmandya-krishna/vc-job3> to be inqueue
I0302 14:00:27.190881       1 enqueue.go:103] Leaving Enqueue ...
I0302 14:00:27.190890       1 allocate.go:43] Enter Allocate ...
I0302 14:00:27.190898       1 job_info.go:705] job vc-job1/kubeflow-user-sharathmandya-krishna actual: map[job1:1], ji.TaskMinAvailable: map[job1:1]
I0302 14:00:27.190933       1 allocate.go:92] Added Job <kubeflow-user-sharathmandya-krishna/vc-job1> into Queue <test>
I0302 14:00:27.190939       1 job_info.go:705] job vc-job2/kubeflow-user-sharathmandya-krishna actual: map[job2:1], ji.TaskMinAvailable: map[job2:1]
I0302 14:00:27.190946       1 allocate.go:92] Added Job <kubeflow-user-sharathmandya-krishna/vc-job2> into Queue <test>
I0302 14:00:27.190952       1 priority.go:70] Priority JobOrderFn: <kubeflow-user-sharathmandya-krishna/vc-job2> priority: 1, <kubeflow-user-sharathmandya-krishna/vc-job1> priority: 1
I0302 14:00:27.190957       1 gang.go:118] Gang JobOrderFn: <kubeflow-user-sharathmandya-krishna/vc-job2> is ready: true, <kubeflow-user-sharathmandya-krishna/vc-job1> is ready: true
I0302 14:00:27.190962       1 drf.go:408] DRF JobOrderFn: <kubeflow-user-sharathmandya-krishna/vc-job2> share state: 0.5, <kubeflow-user-sharathmandya-krishna/vc-job1> share state: 0.25
I0302 14:00:27.190967       1 allocate.go:62] Job <kubeflow-user-sharathmandya-krishna/vc-job3> Queue <test> skip allocate, reason: job status is pending.
I0302 14:00:27.190973       1 allocate.go:96] Try to allocate resource to 1 Namespaces
I0302 14:00:27.190979       1 allocate.go:111] unlockedNode ID: 1a2a0a49-6bf7-4791-8244-0353c2e7010d, Name: uc-wrn-stg-dok8swork-04
I0302 14:00:27.190987       1 proportion.go:278] Queue <test>: deserved <cpu 6000.00, memory 0.00>, allocated <cpu 6000.00, memory 0.00>, share <1>
I0302 14:00:27.190997       1 allocate.go:145] Namespace <kubeflow-user-sharathmandya-krishna> Queue <test> is overused, ignore it.
I0302 14:00:27.191011       1 allocate.go:159] Namespace <kubeflow-user-sharathmandya-krishna> have no queue, skip it
I0302 14:00:27.191023       1 allocate.go:283] Leaving Allocate ...
I0302 14:00:27.191028       1 backfill.go:40] Enter Backfill ...
I0302 14:00:27.191032       1 job_info.go:705] job vc-job1/kubeflow-user-sharathmandya-krishna actual: map[job1:1], ji.TaskMinAvailable: map[job1:1]
I0302 14:00:27.191038       1 job_info.go:705] job vc-job2/kubeflow-user-sharathmandya-krishna actual: map[job2:1], ji.TaskMinAvailable: map[job2:1]
I0302 14:00:27.191044       1 backfill.go:90] Leaving Backfill ...
I0302 14:00:27.191049       1 preempt.go:41] Enter Preempt ...
I0302 14:00:27.191052       1 job_info.go:705] job vc-job1/kubeflow-user-sharathmandya-krishna actual: map[job1:1], ji.TaskMinAvailable: map[job1:1]
I0302 14:00:27.191060       1 preempt.go:63] Added Queue <test> for Job <kubeflow-user-sharathmandya-krishna/vc-job1>
I0302 14:00:27.191064       1 job_info.go:705] job vc-job2/kubeflow-user-sharathmandya-krishna actual: map[job2:1], ji.TaskMinAvailable: map[job2:1]
I0302 14:00:27.191074       1 preempt.go:90] No preemptors in Queue <test>, break.
I0302 14:00:27.191079       1 statement.go:373] Committing operations ...
I0302 14:00:27.191085       1 preempt.go:189] Leaving Preempt ...
I0302 14:00:27.191235       1 session.go:192] Close Session 6f880f6f-5f8f-476d-b24f-af2572f57a12
I0302 14:00:27.191250       1 scheduler.go:112] End scheduling ...

vincentlau0493 · 2022-03-04T09:57:30Z

It looks like the podgroup status of vc-job3 is PENDING, however it should be INQUEUE, which means the job was not accepted by queue. Try using one job as low priority sample, and restart the vc scheduler to clear cache.

snirkop89 · 2022-07-06T04:32:05Z

@Sharathmk99
Does it work for you as of today?

Thor-wl · 2022-07-06T07:36:46Z

I've tried with the latest master branch yesterday. It seems that preemption funciton breaks. I'll take some time to take a look at it recently, too.

elinx · 2022-07-11T07:11:25Z

There are several ways you could try to make it work depends on you situation:

set the lp job to be preemptable by adding annotation: volcano.sh/preemptable: "true"
if the hp job could not be enqueued caused by the proportion plugin like Preemption not working with proportion plugin when queue is full #1772 you could move the overcommit plugin to first tier to hide the effects of proportion plugin
if the lp job could not be preempted caused by the the gang not permiting, you could try to move gang to the second tier or use the suggestion by Preemption not working properly for high priority job #2034 (comment) , they work like the same

The principle of 2 and 3 is that both enqueue and preempt action only consider the first tier result if they could select a victims set:

volcano/pkg/scheduler/framework/session_plugins.go

Lines 403 to 407 in 42702f7

    
           // if plugin exists that votes permit, meanwhile other plugin votes abstention, 
        
           // permit job to be enqueueable, do not check next tier 
        
           if hasFound { 
        
           	return true 
        
           }

volcano/pkg/scheduler/framework/session_plugins.go

Lines 240 to 243 in 42702f7

    
           // Plugins in this tier made decision if victims is not nil 
        
           if victims != nil { 
        
           	return victims 
        
           }

The final work config could be:

    actions: "enqueue, allocate, backfill, preempt"
    tiers:
    - plugins:
      - name: priority
      - name: conformance
      - name: overcommit
        arguments:
          overcommit-factor: 10.0
    - plugins:
      - name: drf
      - name: gang
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

stale · 2022-10-12T10:48:46Z

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale · 2022-12-31T21:17:59Z

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

vincentlau0493 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 22, 2022

Thor-wl assigned hwdef Feb 24, 2022

hwdef mentioned this issue Mar 11, 2022

Preemption not working with proportion plugin when queue is full #1772

Closed

dontan001 mentioned this issue Mar 16, 2022

Preemption between Jobs within Queue - Not Working #2067

Closed

vincentlau0493 changed the title ~~Premption not working properly for high priority job~~ Preemption not working properly for high priority job Mar 25, 2022

william-wang added this to the v1.6 milestone May 13, 2022

Thor-wl self-assigned this Jul 6, 2022

william-wang mentioned this issue Jul 7, 2022

Gang scheduling job with high-priority not preempting lower priority jobs #2337

Closed

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 12, 2022

stale bot closed this as completed Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preemption not working properly for high priority job #2034

Preemption not working properly for high priority job #2034

vincentlau0493 commented Feb 22, 2022 •

edited

Loading

william-wang commented Feb 22, 2022

hwdef commented Feb 22, 2022

vincentlau0493 commented Feb 22, 2022

hwdef commented Feb 22, 2022

vincentlau0493 commented Feb 22, 2022

hwdef commented Feb 23, 2022

Sharathmk99 commented Feb 23, 2022

vincentlau0493 commented Feb 24, 2022

Sharathmk99 commented Mar 2, 2022 •

edited

Loading

vincentlau0493 commented Mar 4, 2022

snirkop89 commented Jul 6, 2022

Thor-wl commented Jul 6, 2022

elinx commented Jul 11, 2022 •

edited

Loading

stale bot commented Oct 12, 2022

stale bot commented Dec 31, 2022

Preemption not working properly for high priority job #2034

Preemption not working properly for high priority job #2034

Comments

vincentlau0493 commented Feb 22, 2022 • edited Loading

william-wang commented Feb 22, 2022

hwdef commented Feb 22, 2022

vincentlau0493 commented Feb 22, 2022

hwdef commented Feb 22, 2022

vincentlau0493 commented Feb 22, 2022

hwdef commented Feb 23, 2022

Sharathmk99 commented Feb 23, 2022

vincentlau0493 commented Feb 24, 2022

Sharathmk99 commented Mar 2, 2022 • edited Loading

vincentlau0493 commented Mar 4, 2022

snirkop89 commented Jul 6, 2022

Thor-wl commented Jul 6, 2022

elinx commented Jul 11, 2022 • edited Loading

stale bot commented Oct 12, 2022

stale bot commented Dec 31, 2022

vincentlau0493 commented Feb 22, 2022 •

edited

Loading

Sharathmk99 commented Mar 2, 2022 •

edited

Loading

elinx commented Jul 11, 2022 •

edited

Loading