Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs for workloadPriority #1170

Merged
merged 1 commit into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions keps/973-workload-priority/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ This label is always mutable because it might be useful for the preemption.

```yaml
# sample-priority-class.yaml
apiVersion: kueue.x-k8s.io/v1alpha1
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: sample-priority
Expand All @@ -154,7 +154,7 @@ description: "Sample priority"
apiVersion: batch/v1
kind: Job
metadata:
generateName: sample-job-
name: sample-job
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/priority-class: sample-priority
Expand All @@ -171,16 +171,15 @@ spec:
```

The following workload is generated by the yaml above.
The `PriorityClass` field can accept both k8s `PriorityClass` and `workloadPriorityClass` names as values.
To distinguish, when using `workloadPriorityClass`, a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value.
When using k8s `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value.
The `PriorityClassName` field can accept either `PriorityClass` or `workloadPriorityClass` name as a value.
To distinguish, when using `WorkloadPriorityClass`, a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value.
When using `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value.

```yaml
# sample-workload.yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: Workload
metadata:
name: job-sample-job-jf5fb-f5982
name: job-sample-job-7f173
spec:
priorityClassSource: kueue.x-k8s.io/workloadpriorityclass
priorityClassName: sample-priority
Expand Down
6 changes: 6 additions & 0 deletions site/content/en/docs/concepts/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ single tenant.
An application that will run to completion. It is the unit of _admission_ in
Kueue. Sometimes referred to as _job_.

### [Workload Priority Class](/docs/concepts/workload_priority_class)

`WorkloadPriorityClass` defines a priority class for a workload,
independently from [pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/).
This priority value from a `WorkloadPriorityClass` is only used for managing the queueing and preemption of [Workloads](#workload).

![Components](/images/queueing-components.svg)

## Glossary
Expand Down
11 changes: 8 additions & 3 deletions site/content/en/docs/concepts/workload.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,11 +89,15 @@ In addition to the usual resource naming restrictions, you cannot use the `pods`
## Priority

Workloads have a priority that influences the [order in which they are admitted by a ClusterQueue](/docs/concepts/cluster_queue#queueing-strategy).
You can see the priority of the Workload in the field `.spec.priority`.
There are two ways to set the Workload priority:

- **Pod Priority**: You can see the priority of the Workload in the field `.spec.priority`.
For a `batch/v1.Job`, Kueue sets the priority of the Workload based on the
[pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/)
of the Job's pod template.
[pod priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/) of the Job's pod template.

- **WorkloadPriority**: Sometimes developers would like to control workload's priority without affecting pod's priority.
By using [`WorkloadPriority`](/docs/concepts/workload_priority_class),
you can independently manage the priority of workloads for queuing and preemption, separate from pod's priority.

## Custom Workloads

Expand All @@ -103,4 +107,5 @@ creating a corresponding Workload object for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the Priority section in this file as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

## What's next

- Learn about [workload priority class](/docs/concepts/workload_priority_class).
- Learn how to [run jobs](/docs/tasks/run_jobs)
114 changes: 114 additions & 0 deletions site/content/en/docs/concepts/workload_priority_class.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: "Workload Priority Class"
Gekko0114 marked this conversation as resolved.
Show resolved Hide resolved
date: 2023-10-02
weight: 6
description: >
A priority class which value is utilized by Kueue controller and is independent from pod's priority.
---

A `WorkloadPriorityClass` allows you to control the [`Workload`'s](/docs/concepts/workload) priority without affecting the pod's priority.
This feature is useful for these cases:
- want to prioritize workloads that remain inactive for a specific duration
- want to set a lower priority for development workloads and higher priority for production workloads

A sample WorkloadPriorityClass looks like the following:

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: sample-priority
value: 10000
description: "Sample priority"
```
Gekko0114 marked this conversation as resolved.
Show resolved Hide resolved

`WorkloadPriorityClass` objects are cluster scoped, so they can be used by a job in any namespace.

## How to use WorkloadPriorityClass on Jobs

You can specify the `WorkloadPriorityClass` by setting the label `kueue.x-k8s.io/priority-class`.

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/priority-class: sample-priority
spec:
...
```

Kueue generates the following `Workload` for the Job above.
The `PriorityClassName` field can accept either `PriorityClass` or
`WorkloadPriorityClass` name as a value. To distinguish, when using `WorkloadPriorityClass`,
a `priorityClassSource` field has the `kueue.x-k8s.io/workloadpriorityclass` value.
When using `PriorityClass`, a `priorityClassSource` field has the `scheduling.k8s.io/priorityclass` value.

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: Workload
metadata:
name: job-sample-job-7f173
spec:
priorityClassSource: kueue.x-k8s.io/workloadpriorityclass
priorityClassName: sample-priority
priority: 10000
queueName: user-queue
...
```

For other job frameworks, you can set `WorkloadPriorityClass` using the same label.
The Following is an example of `MPIJob`.

```yaml
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: pi
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/priority-class: sample-priority
spec:
...
```

## The relationship between pod's priority and workload's priority

When creating a `Workload` for a given job, Kueue considers the following scenarios:
1. A job specifies both `WorkloadPriorityClass` and `PriorityClass`
- `WorkloadPriorityClass` is used for the workload's priority.
- `PriorityClass` is used for the pod's priority.
2. A job specifies only `WorkloadPriorityClass`
- `WorkloadPriorityClass` is used for the workload's priority.
- `WorkloadPriorityClass` is not used for pod's priority.
3. A job specifies only `PriorityClass`
- `PriorityClass` is used for the workload's priority and pod's priority.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be better to explain what happens if the job implement the JobWithPriorityClass interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y this might be too developer focused and a user cannot do too much about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also concerned about what you say. However, actually, which priorities are used depends on the interface.
For example, we can set three priorities (.spec.runPolicy.priorityClass, .spec.labels.kueue.x-k8s.io/priority-class, and .spec.mpiReplicaSecs[*].template.spec.priorityClassName) on the MPIJob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, instead of mentioning the interface, raise MPIJob as a specific case.

@alculquicondor Any better ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments! Added the explanation related to JobWithPriorityClass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how Hiroyuki phrased it :)

In certain job frameworks, there are CRDs that:
- Define multiple pod specs, where each can have their own pod priority, or
- Define the overall pod priority in a dedicated field.
By default kueue will take the PriorityClassName of the first PodSet having one set,
however the integration of the CRD with Kueue can implement
[`JobWithPriorityClass interface`](https://github.com/kubernetes-sigs/kueue/blob/e162f8508b503d20feb9b31fd0b27d91e58f2c2f/pkg/controller/jobframework/interface.go#L81-L84)
to change this behavior. You can read the code for each job integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's what we have today :(

If you have the time, please update the documentation for MPIJob and other kubeflow CRDs to explain how we obtain the priority class for each.

to learn how the priority class is obtained.

## Where workload's priority is used

The priority of workloads is used for:
- Sorting the workloads in the ClusterQueues.
- Determining whether a workload can preempt others.

## Workload's priority values are always mutable

The `Workload`'s `Priority` field is always mutable.
If a `Workload` has been pending for a while, you can consider updating its priority to execute it earlier,
based on your own policies.
Workload's `PriorityClassSource` and `PriorityClassName` fields are immutable.

## What's next?

- Learn how to [run jobs](/docs/tasks/run_jobs)
- Learn how to [run jobs with workload priority](/docs/tasks/run_job_with_workload_priority)
3 changes: 2 additions & 1 deletion site/content/en/docs/tasks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ As a batch administrator, you can learn how to:
- Setup [Sequential Admission with Ready Pods](/docs/tasks/setup_sequential_admission).
- As a batch administrator, you can learn how to
[monitor pending workloads](/docs/tasks/monitor_pending_workloads).
- As a batch administrator, you can learn how to [run a Kueue managed Jobs with a custom WorkloadPriority](/docs/tasks/run_job_with_workload_priority).

### Batch user

Expand All @@ -38,7 +39,7 @@ As a batch user, you can learn how to:
Kueue supports MPIJob v2beta1, PyTorchJob, TFJob, XGBoostJob, and PaddleJob.
- [Run a Kueue managed KubeRay RayJob](/docs/tasks/run_rayjobs).
- [Submit Kueue jobs from Python](/docs/tasks/run_python_jobs).
- [Run a Kueue managed plain Pod](/docs/tasks/run_plain_pods)
- [Run a Kueue managed plain Pod](/docs/tasks/run_plain_pods).

### Platform developer

Expand Down
85 changes: 85 additions & 0 deletions site/content/en/docs/tasks/run_job_with_workload_priority.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: "Run job with WorkloadPriority"
date: 2023-10-02
weight: 8
description: >
Run job with WorkloadPriority, which is independent from Pod's priority
---

Usually, in Kueue, workload's priority is calculated using for pod's priority for queuing and preemption.
By using a [`WorkloadPriorityClass`](/docs/concepts/workload_priority_class),
you can independently manage the priority of workloads for queuing and preemption, separate from pod's priority.

This page contains instructions on how to run a job with workload priority.

## Before you begin

Make sure the following conditions are met:

- A Kubernetes cluster is running.
- The kubectl command-line tool has communication with your cluster.
- [Kueue is installed](/docs/installation).

## 0. Create WorkloadPriorityClass

The WorkloadPriorityClass should be created first.

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: WorkloadPriorityClass
metadata:
name: sample-priority
value: 10000
description: "Sample priority"
```

## 1. Create Job with `kueue.x-k8s.io/priority-class` label

You can specify the `WorkloadPriorityClass` by setting the label `kueue.x-k8s.io/priority-class`.
This is same for other CRDs like `RayJob`.

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job
labels:
kueue.x-k8s.io/queue-name: user-queue
kueue.x-k8s.io/priority-class: sample-priority
spec:
parallelism: 3
completions: 3
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
restartPolicy: Never
```

Kueue generates the following `Workload` for the Job above.
The priority of workloads is utilized in queuing, preemption, and other scheduling processes in Kueue.
This priority doesn't affect pod's priority.
Workload's `Priority` field is always mutable because it might be useful for the preemption.
Workload's `PriorityClassSource` and `PriorityClassName` fields are immutable.

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: Workload
metadata:
name: job-sample-job-7f173
spec:
priorityClassSource: kueue.x-k8s.io/workloadpriorityclass
priorityClassName: sample-priority
priority: 10000
queueName: user-queue
podSets:
- count: 3
name: dummy-job
template:
spec:
containers:
- image: gcr.io/k8s-staging-perf-tests/sleep:latest
name: dummy-job
```