Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite preemption loop is possible when PrioritySortingWithinCohort=false is used with borrowWithinCohort #2821

Closed
mimowo opened this issue Aug 12, 2024 · 2 comments · Fixed by #2807
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mimowo
Copy link
Contributor

mimowo commented Aug 12, 2024

What happened:

An infinite preemption loop is possible when PrioritySortingWithingCohort=false is used together with
borrowWithinCohort. This is possible when a high-priority workload from CQ running above quota needs
to borrow and preempts a lower priority workload. Then, the lower priority workload may take spot
in front of the high-priority workload (because PrioritySortingWithingCohort=false) and get re-admitted.
The cycle (preempt - admit - preempt) will repeat for the lower-priority workload.

The high-priority workload is never admitted (as long as its CQ is running above it nominal quota), because the lower-priority workload gets in front of it repeatedly.

What you expected to happen:

No infinite cycles of preempt - admit - preempt.

How to reproduce it (as minimally and precisely as possible):

  1. Set up the cluster as follows:
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue-a"
spec:
  cohort: "all"
  preemption:
    withinClusterQueue: LowerPriority
    reclaimWithinCohort: Any
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 0
        borrowingLimit: 10
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue-b"
spec:
  cohort: "all"
  preemption:
    withinClusterQueue: LowerPriority
    reclaimWithinCohort: Any
    borrowWithinCohort:
      maxPriorityThreshold: 80000
      policy: LowerPriority
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 5
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: "cluster-queue-c"
spec:
  cohort: "all"
  preemption:
    withinClusterQueue: LowerPriority
    reclaimWithinCohort: Any
  namespaceSelector: {} # match all.
  resourceGroups:
  - coveredResources: ["cpu"]
    flavors:
    - name: "default-flavor"
      resources:
      - name: "cpu"
        nominalQuota: 5
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue-a"
spec:
  clusterQueue: "cluster-queue-a"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue-b"
spec:
  clusterQueue: "cluster-queue-b"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  namespace: "default"
  name: "user-queue-c"
spec:
  clusterQueue: "cluster-queue-c"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 1
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: mid-priority
value: 2
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 3
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
  1. create the lower-priority workload in CQa:
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-a-
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: user-queue-a
spec:
  parallelism: 1
  completions: 1
  suspend: true
  template:
    spec:
      priorityClassName: mid-priority
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
        args: ["300s"]
        resources:
          requests:
            cpu: "4"
      restartPolicy: Never
  1. Create 2 workloads in CQb:
apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-b-
  namespace: default
  labels:
    kueue.x-k8s.io/queue-name: user-queue-b
spec:
  parallelism: 1
  completions: 1
  suspend: true
  template:
    spec:
      priorityClassName: high-priority
      containers:
      - name: dummy-job
        image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
        args: ["300s"]
        resources:
          requests:
            cpu: "4"
      restartPolicy: Never

Issue: workload-a will go through infinite cycle of preemptions and admissions.

@mimowo mimowo added the kind/bug Categorizes issue or PR as related to a bug. label Aug 12, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Aug 12, 2024

/assign

@tenzen-y
Copy link
Member

/retitle Infinite preemption loop is possible when PrioritySortingWithinCohort=false is used with borrowWithinCohort

@k8s-ci-robot k8s-ci-robot changed the title Infinite preemption loop is possible when PrioritySortingWithingCohort=false is used with borrowWithinCohort Infinite preemption loop is possible when PrioritySortingWithinCohort=false is used with borrowWithinCohort Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants