diff --git a/keps/973-workload-priority/README.md b/keps/973-workload-priority/README.md index 785f7ba8ee..3d9f6da71d 100644 --- a/keps/973-workload-priority/README.md +++ b/keps/973-workload-priority/README.md @@ -143,7 +143,7 @@ This label is always mutable because it might be useful for the preemption. ```yaml # sample-priority-class.yaml -apiVersion: kueue.x-k8s.io/v1alpha1 +apiVersion: kueue.x-k8s.io/v1beta1 kind: WorkloadPriorityClass metadata: name: sample-priority @@ -154,7 +154,7 @@ description: "Sample priority" apiVersion: batch/v1 kind: Job metadata: - generateName: sample-job- + name: sample-job labels: kueue.x-k8s.io/queue-name: user-queue kueue.x-k8s.io/priority-class: sample-priority @@ -171,16 +171,15 @@ spec: ``` The following workload is generated by the yaml above. -The `PriorityClass` field can accept both k8s `PriorityClass` and `workloadPriorityClass` names as values. -To distinguish, when using `workloadPriorityClass`, a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value. -When using k8s `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value. +The `PriorityClassName` field can accept either `PriorityClass` or `workloadPriorityClass` name as a value. +To distinguish, when using `WorkloadPriorityClass`, a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value. +When using `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value. ```yaml -# sample-workload.yaml apiVersion: kueue.x-k8s.io/v1beta1 kind: Workload metadata: - name: job-sample-job-jf5fb-f5982 + name: job-sample-job-7f173 spec: priorityClassSource: kueue.x-k8s.io/workloadpriorityclass priorityClassName: sample-priority diff --git a/site/content/en/docs/concepts/_index.md b/site/content/en/docs/concepts/_index.md index 68e5bf79aa..f32f8fc2c4 100644 --- a/site/content/en/docs/concepts/_index.md +++ b/site/content/en/docs/concepts/_index.md @@ -34,6 +34,12 @@ single tenant. An application that will run to completion. It is the unit of _admission_ in Kueue. Sometimes referred to as _job_. +### [Workload Priority Class](/docs/concepts/workload_priority_class) + +`WorkloadPriorityClass` defines a priority class for a workload, +independently from [pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/). +This priority value from a `WorkloadPriorityClass` is only used for managing the queueing and preemption of [Workloads](#workload). + ![Components](/images/queueing-components.svg) ## Glossary diff --git a/site/content/en/docs/concepts/workload.md b/site/content/en/docs/concepts/workload.md index 3ad297437a..05d74e036a 100644 --- a/site/content/en/docs/concepts/workload.md +++ b/site/content/en/docs/concepts/workload.md @@ -89,11 +89,15 @@ In addition to the usual resource naming restrictions, you cannot use the `pods` ## Priority Workloads have a priority that influences the [order in which they are admitted by a ClusterQueue](/docs/concepts/cluster_queue#queueing-strategy). -You can see the priority of the Workload in the field `.spec.priority`. +There are two ways to set the Workload priority: +- **Pod Priority**: You can see the priority of the Workload in the field `.spec.priority`. For a `batch/v1.Job`, Kueue sets the priority of the Workload based on the -[pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) -of the Job's pod template. +[pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) of the Job's pod template. + +- **WorkloadPriority**: Sometimes developers would like to control workload's priority without affecting pod's priority. +By using [`WorkloadPriority`](/docs/concepts/workload_priority_class), +you can independently manage the priority of workloads for queuing and preemption, separate from pod's priority. ## Custom Workloads @@ -103,4 +107,5 @@ creating a corresponding Workload object for it. ## What's next +- Learn about [workload priority class](/docs/concepts/workload_priority_class). - Learn how to [run jobs](/docs/tasks/run_jobs) diff --git a/site/content/en/docs/concepts/workload_priority_class.md b/site/content/en/docs/concepts/workload_priority_class.md new file mode 100644 index 0000000000..7618bfbf0d --- /dev/null +++ b/site/content/en/docs/concepts/workload_priority_class.md @@ -0,0 +1,114 @@ +--- +title: "Workload Priority Class" +date: 2023-10-02 +weight: 6 +description: > + A priority class which value is utilized by Kueue controller and is independent from pod's priority. +--- + +A `WorkloadPriorityClass` allows you to control the [`Workload`'s](/docs/concepts/workload) priority without affecting the pod's priority. +This feature is useful for these cases: +- want to prioritize workloads that remain inactive for a specific duration +- want to set a lower priority for development workloads and higher priority for production workloads + +A sample WorkloadPriorityClass looks like the following: + +```yaml +apiVersion: kueue.x-k8s.io/v1beta1 +kind: WorkloadPriorityClass +metadata: + name: sample-priority +value: 10000 +description: "Sample priority" +``` + +`WorkloadPriorityClass` objects are cluster scoped, so they can be used by a job in any namespace. + +## How to use WorkloadPriorityClass on Jobs + +You can specify the `WorkloadPriorityClass` by setting the label `kueue.x-k8s.io/priority-class`. + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: sample-job + labels: + kueue.x-k8s.io/queue-name: user-queue + kueue.x-k8s.io/priority-class: sample-priority +spec: +... +``` + +Kueue generates the following `Workload` for the Job above. +The `PriorityClassName` field can accept either `PriorityClass` or +`WorkloadPriorityClass` name as a value. To distinguish, when using `WorkloadPriorityClass`, +a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value. +When using `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value. + +```yaml +apiVersion: kueue.x-k8s.io/v1beta1 +kind: Workload +metadata: + name: job-sample-job-7f173 +spec: + priorityClassSource: kueue.x-k8s.io/workloadpriorityclass + priorityClassName: sample-priority + priority: 10000 + queueName: user-queue +... +``` + +For other job frameworks, you can set `WorkloadPriorityClass` using the same label. +The Following is an example of `MPIJob`. + +```yaml +apiVersion: kubeflow.org/v2beta1 +kind: MPIJob +metadata: + name: pi + labels: + kueue.x-k8s.io/queue-name: user-queue + kueue.x-k8s.io/priority-class: sample-priority +spec: +... +``` + +## The relationship between pod's priority and workload's priority + +When creating a `Workload` for a given job, Kueue considers the following scenarios: +1. A job specifies both `WorkloadPriorityClass` and `PriorityClass` +- `WorkloadPriorityClass` is used for the workload's priority. +- `PriorityClass` is used for the pod's priority. +2. A job specifies only `WorkloadPriorityClass` +- `WorkloadPriorityClass` is used for the workload's priority. +- `WorkloadPriorityClass` is not used for pod's priority. +3. A job specifies only `PriorityClass` +- `PriorityClass` is used for the workload's priority and pod's priority. + +In certain job frameworks, there are CRDs that: +- Define multiple pod specs, where each can have their own pod priority, or +- Define the overall pod priority in a dedicated field. +By default kueue will take the PriorityClassName of the first PodSet having one set, +however the integration of the CRD with Kueue can implement +[`JobWithPriorityClass interface`](https://github.com/kubernetes-sigs/kueue/blob/e162f8508b503d20feb9b31fd0b27d91e58f2c2f/pkg/controller/jobframework/interface.go#L81-L84) +to change this behavior. You can read the code for each job integration +to learn how the priority class is obtained. + +## Where workload's priority is used + +The priority of workloads is used for: +- Sorting the workloads in the ClusterQueues. +- Determining whether a workload can preempt others. + +## Workload's priority values are always mutable + +The `Workload`'s `Priority` field is always mutable. +If a `Workload` has been pending for a while, you can consider updating its priority to execute it earlier, +based on your own policies. +Workload's `PriorityClassSource` and `PriorityClassName` fields are immutable. + +## What's next? + +- Learn how to [run jobs](/docs/tasks/run_jobs) +- Learn how to [run jobs with workload priority](/docs/tasks/run_job_with_workload_priority) diff --git a/site/content/en/docs/tasks/_index.md b/site/content/en/docs/tasks/_index.md index 72f0adc34b..9d44846b90 100755 --- a/site/content/en/docs/tasks/_index.md +++ b/site/content/en/docs/tasks/_index.md @@ -25,6 +25,7 @@ As a batch administrator, you can learn how to: - Setup [Sequential Admission with Ready Pods](/docs/tasks/setup_sequential_admission). - As a batch administrator, you can learn how to [monitor pending workloads](/docs/tasks/monitor_pending_workloads). +- As a batch administrator, you can learn how to [run a Kueue managed Jobs with a custom WorkloadPriority](/docs/tasks/run_job_with_workload_priority). ### Batch user @@ -38,7 +39,7 @@ As a batch user, you can learn how to: Kueue supports MPIJob v2beta1, PyTorchJob, TFJob, XGBoostJob, and PaddleJob. - [Run a Kueue managed KubeRay RayJob](/docs/tasks/run_rayjobs). - [Submit Kueue jobs from Python](/docs/tasks/run_python_jobs). -- [Run a Kueue managed plain Pod](/docs/tasks/run_plain_pods) +- [Run a Kueue managed plain Pod](/docs/tasks/run_plain_pods). ### Platform developer diff --git a/site/content/en/docs/tasks/run_job_with_workload_priority.md b/site/content/en/docs/tasks/run_job_with_workload_priority.md new file mode 100644 index 0000000000..d30d955518 --- /dev/null +++ b/site/content/en/docs/tasks/run_job_with_workload_priority.md @@ -0,0 +1,85 @@ +--- +title: "Run job with WorkloadPriority" +date: 2023-10-02 +weight: 8 +description: > + Run job with WorkloadPriority, which is independent from Pod's priority +--- + +Usually, in Kueue, workload's priority is calculated using for pod's priority for queuing and preemption. +By using a [`WorkloadPriorityClass`](/docs/concepts/workload_priority_class), +you can independently manage the priority of workloads for queuing and preemption, separate from pod's priority. + +This page contains instructions on how to run a job with workload priority. + +## Before you begin + +Make sure the following conditions are met: + +- A Kubernetes cluster is running. +- The kubectl command-line tool has communication with your cluster. +- [Kueue is installed](/docs/installation). + +## 0. Create WorkloadPriorityClass + +The WorkloadPriorityClass should be created first. + +```yaml +apiVersion: kueue.x-k8s.io/v1beta1 +kind: WorkloadPriorityClass +metadata: + name: sample-priority +value: 10000 +description: "Sample priority" +``` + +## 1. Create Job with `kueue.x-k8s.io/priority-class` label + +You can specify the `WorkloadPriorityClass` by setting the label `kueue.x-k8s.io/priority-class`. +This is same for other CRDs like `RayJob`. + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: sample-job + labels: + kueue.x-k8s.io/queue-name: user-queue + kueue.x-k8s.io/priority-class: sample-priority +spec: + parallelism: 3 + completions: 3 + suspend: true + template: + spec: + containers: + - name: dummy-job + image: gcr.io/k8s-staging-perf-tests/sleep:latest + restartPolicy: Never +``` + +Kueue generates the following `Workload` for the Job above. +The priority of workloads is utilized in queuing, preemption, and other scheduling processes in Kueue. +This priority doesn't affect pod's priority. +Workload's `Priority` field is always mutable because it might be useful for the preemption. +Workload's `PriorityClassSource` and `PriorityClassName` fields are immutable. + +```yaml +apiVersion: kueue.x-k8s.io/v1beta1 +kind: Workload +metadata: + name: job-sample-job-7f173 +spec: + priorityClassSource: kueue.x-k8s.io/workloadpriorityclass + priorityClassName: sample-priority + priority: 10000 + queueName: user-queue + podSets: + - count: 3 + name: dummy-job + template: + spec: + containers: + - image: gcr.io/k8s-staging-perf-tests/sleep:latest + name: dummy-job +```