You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An infinite preemption loop is possible when PrioritySortingWithingCohort=false is used together with
borrowWithinCohort. This is possible when a high-priority workload from CQ running above quota needs
to borrow and preempts a lower priority workload. Then, the lower priority workload may take spot
in front of the high-priority workload (because PrioritySortingWithingCohort=false) and get re-admitted.
The cycle (preempt - admit - preempt) will repeat for the lower-priority workload.
The high-priority workload is never admitted (as long as its CQ is running above it nominal quota), because the lower-priority workload gets in front of it repeatedly.
What you expected to happen:
No infinite cycles of preempt - admit - preempt.
How to reproduce it (as minimally and precisely as possible):
Set up the cluster as follows:
apiVersion: kueue.x-k8s.io/v1beta1kind: ResourceFlavormetadata:
name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1kind: ClusterQueuemetadata:
name: "cluster-queue-a"spec:
cohort: "all"preemption:
withinClusterQueue: LowerPriorityreclaimWithinCohort: AnynamespaceSelector: {} # match all.resourceGroups:
- coveredResources: ["cpu"]flavors:
- name: "default-flavor"resources:
- name: "cpu"nominalQuota: 0borrowingLimit: 10
---
apiVersion: kueue.x-k8s.io/v1beta1kind: ClusterQueuemetadata:
name: "cluster-queue-b"spec:
cohort: "all"preemption:
withinClusterQueue: LowerPriorityreclaimWithinCohort: AnyborrowWithinCohort:
maxPriorityThreshold: 80000policy: LowerPrioritynamespaceSelector: {} # match all.resourceGroups:
- coveredResources: ["cpu"]flavors:
- name: "default-flavor"resources:
- name: "cpu"nominalQuota: 5
---
apiVersion: kueue.x-k8s.io/v1beta1kind: ClusterQueuemetadata:
name: "cluster-queue-c"spec:
cohort: "all"preemption:
withinClusterQueue: LowerPriorityreclaimWithinCohort: AnynamespaceSelector: {} # match all.resourceGroups:
- coveredResources: ["cpu"]flavors:
- name: "default-flavor"resources:
- name: "cpu"nominalQuota: 5
---
apiVersion: kueue.x-k8s.io/v1beta1kind: LocalQueuemetadata:
namespace: "default"name: "user-queue-a"spec:
clusterQueue: "cluster-queue-a"
---
apiVersion: kueue.x-k8s.io/v1beta1kind: LocalQueuemetadata:
namespace: "default"name: "user-queue-b"spec:
clusterQueue: "cluster-queue-b"
---
apiVersion: kueue.x-k8s.io/v1beta1kind: LocalQueuemetadata:
namespace: "default"name: "user-queue-c"spec:
clusterQueue: "cluster-queue-c"
---
apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata:
name: low-priorityvalue: 1globalDefault: falsedescription: "This priority class should be used for XYZ service pods only."
---
apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata:
name: mid-priorityvalue: 2globalDefault: falsedescription: "This priority class should be used for XYZ service pods only."
---
apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata:
name: high-priorityvalue: 3globalDefault: falsedescription: "This priority class should be used for XYZ service pods only."
/retitle Infinite preemption loop is possible when PrioritySortingWithinCohort=false is used with borrowWithinCohort
k8s-ci-robot
changed the title
Infinite preemption loop is possible when PrioritySortingWithingCohort=false is used with borrowWithinCohort
Infinite preemption loop is possible when PrioritySortingWithinCohort=false is used with borrowWithinCohort
Aug 22, 2024
What happened:
An infinite preemption loop is possible when
PrioritySortingWithingCohort=false
is used together withborrowWithinCohort. This is possible when a high-priority workload from CQ running above quota needs
to borrow and preempts a lower priority workload. Then, the lower priority workload may take spot
in front of the high-priority workload (because PrioritySortingWithingCohort=false) and get re-admitted.
The cycle (preempt - admit - preempt) will repeat for the lower-priority workload.
The high-priority workload is never admitted (as long as its CQ is running above it nominal quota), because the lower-priority workload gets in front of it repeatedly.
What you expected to happen:
No infinite cycles of preempt - admit - preempt.
How to reproduce it (as minimally and precisely as possible):
Issue: workload-a will go through infinite cycle of preemptions and admissions.
The text was updated successfully, but these errors were encountered: