Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload priority #973

Closed
3 tasks done
Tracked by #974
alculquicondor opened this issue Jul 11, 2023 · 14 comments · Fixed by #1081
Closed
3 tasks done
Tracked by #974

Workload priority #973

alculquicondor opened this issue Jul 11, 2023 · 14 comments · Fixed by #1081
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alculquicondor
Copy link
Contributor

alculquicondor commented Jul 11, 2023

What would you like to be added:

A priority that is independent from Pod priority.
The priority value should be part of the Workload spec and be mutable.

The jobs can express the priority via an annotation. A PriorityClass could be used to give pre-determined values.

#884 goes along these lines.

Why is this needed:

Tying a Job queueing priority to Pod priority could be problematic as the latter is also tied to Pod preemption.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@alculquicondor alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 11, 2023
@alculquicondor
Copy link
Contributor Author

@Gekko0114 do you want to take this?

@Gekko0114
Copy link
Member

@alculquicondor
Thank you!
Yes, I want to take this. Should I write KEP first?

@alculquicondor
Copy link
Contributor Author

One important question is whether we need our own PriorityClass object. The KEP could reflect the advantages and disadvantages.
An alternative would be to just reuse the existing k8s PriorityClass, but leave the door open to have our own in the future. We need to explore how to do that.

The easiest starting point would be just to have a numerical value as a job annotation that is reflected as a numerical field in the Workload object.

@Gekko0114
Copy link
Member

Thanks for your explanation!
Based on your comment, I will roughly write KEP.

@kerthcet
Copy link
Contributor

Have we considered adding a job priority to batchv1.Job, it sounds more reasonable. And the job priority could be inherited by the pod.

whether we need our own PriorityClass object

Prefer to reuse the PriorityClass, IMHO, we should not invent too many concepts since we're working with the upstream, and I think it's ok to change the priority value inside kueue based on the policies.

@alculquicondor
Copy link
Contributor Author

I think it's ok to change the priority value inside kueue based on the policies.

I initially thought this could be a good idea. But from my experience, different organizations have wildly different ideas of how priority policies should work.
Once there is a separate field for priority, organizations could write small controllers to change the value.

Another alternative would be to look at existing policies in other systems and see if they make sense for Kueue.

@tenzen-y
Copy link
Member

+1 on this feature.

Currently, some proposals are submitted for the kueue scheduling order. Once this feature is supported, I think we can simplify those features.

@kerthcet
Copy link
Contributor

Another question about tying job priority to pod priority is if we decoupled them, a high-priority job with low-priority pods may never run to completion because it maybe preempted always then make it admitted is somehow meaningless.

@alculquicondor
Copy link
Contributor Author

That is generally true, but less relevant in an environment with autoscaling.

@Gekko0114
Copy link
Member

I think it's better to create a new priority class for workload rather than reusing the k8s priorityClass.
If we reuse priorityClass for workload, we will have to consider all feature extensions of the k8s priorityClass.
The current k8s priorityClass has several fields, but some fields are not necessary for workload, like peremptionPolicy.
https://github.com/kubernetes/kubernetes/blob/99190634ab252604a4496882912ac328542d649d/pkg/apis/scheduling/types.go

@Gekko0114
Copy link
Member

Gekko0114 commented Aug 2, 2023

Once the directions regarding priorityClass and Preemption are determined, I can write the KEP.

Tying a Job queueing priority to Pod priority could be problematic as the latter is also tied to Pod preemption.

I think it's OK using workload priority for Preemption calculations in Kueue. In fact, when calculating preemption in Kueue, it seems we are already using the priority of running workloads.

func findCandidates(wl *kueue.Workload, cq *cache.ClusterQueue, resPerFlv resourcesPerFlavor) []*workload.Info {

Please correct me if I misunderstood

@alculquicondor
Copy link
Contributor Author

If we reuse priorityClass for workload, we will have to consider all feature extensions of the k8s priorityClass.

My thinking is that different organisations might want to have different "PriorityClass" CRDs for their own extensions. Then, it might be nice to have a Kueue PriorityClass, but it should be rather simple. The fields should be:

  • (initial) priority
  • Rereference to another object that could be a CRD defined by an organization.

Regarding the preemption discussion, I think it's ok to have a numeric priority for Kueue that is different from kube-scheduler's priority. We should document the risks of pod preemption to users. We can also point users to create PriorityClasses for their pods that are non-preempting https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#non-preempting-priority-class

@Gekko0114
Copy link
Member

I see. Thanks for your inputs!
I will write a KEP then.

@Gekko0114
Copy link
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants