Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Framework to support different queuing policies #10

Open
denkensk opened this issue Feb 18, 2022 · 17 comments
Open

Simple Framework to support different queuing policies #10

denkensk opened this issue Feb 18, 2022 · 17 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/grand-feature lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@denkensk
Copy link
Member

We need a simple framework to support different policies or algorithms for every phases in Job scheduling.

/kind feature
/cc @ahg-g @alculquicondor

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2022
@denkensk
Copy link
Member Author

/assign @denkensk

@ahg-g ahg-g added priority/backlog Higher priority than priority/awaiting-more-evidence. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 18, 2022
@alculquicondor
Copy link
Contributor

Can you begin by enumerating the locations where there could be multiple policies in play?

A few I have identified:

  • A policy for sorting items in a Queue. Currently it's FIFO on creationTimestamp.
  • A policy for re-queuing and backoff.
  • A policy for choosing flavors. Currently it goes for the first flavor, even if it borrows from other capacity. Some might prefer to first use flavors without borrowing and then try to borrow starting from the top of the list again.

The next question is how many of those policies require a framework (like kube-scheduler) as opposed to being simple fields in the APIs.

@denkensk
Copy link
Member Author

A few I have identified:
A policy for sorting items in a Queue. Currently it's FIFO on creationTimestamp.
A policy for re-queuing and backoff.
A policy for choosing flavors. Currently it goes for the first flavor, even if it borrows from other capacity. Some might prefer to first use flavors without borrowing and then try to borrow starting from the top of the list again.

In addition to the ones you mentioned above, I think there are a few other locations that need support for extension.

  • A policy for sorting between different tenants.
  • If we think of determining whether a workload has enough capacity to run as one of the filters, Users will have other strategies to expand as filter like cluster capacity?
  • If we think of recycling the resource as postfilter, It will be much more policies to have in which job will be recycled

The next question is how many of those policies require a framework (like kube-scheduler) as opposed to being simple fields in the APIs.

It will be sample if we use it as the fields in APIs. But this does not prevent us from providing an extensible framework.

@alculquicondor
Copy link
Contributor

I think it's too early to talk about a framework, but we can keep the discussion open for the future.

@denkensk
Copy link
Member Author

We can clean the important-soon firstly. But in any case, we need to be able to allow users to customize their own policies, right? This is really needed!

@ahg-g ahg-g added kind/grand-feature and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 13, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2022
@alculquicondor
Copy link
Contributor

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 12, 2022
@talcoh2x
Copy link

Hi, I think adding such option to change or control on queue order is very important. we do a lot of thing on scheduler side since we have the option to extend it - queueSort, preFilter, filter, and more ..
from my perspective kueue "hold" me the quota request and give me the option organize it before execution ( before we send the scheduler) once i have it i can organize and send the request according to -

  • "priority", "labels", PodGroup size ...
  • i can create "label queue " e.g: label_queue: low label_queue: high so for the same priority i can decide who will start first.

we cant do that today because we have resources quota kueue is the solution :)

also i don't think re-queuing and backoff so important we have the scheduler for that. just postFilter for start i believe will be good it will be user problem to take in care situation like

  • quota size and more ...

@alculquicondor
Copy link
Contributor

alculquicondor commented Jan 11, 2023

I wonder if the requirements you have can be general enough that they could make part of kueue, although at first glance they look very custom.

It might be useful to you to think about how you would configure these sorting criteria. If you are open to share, we can provide feedback and if enough contributors find it useful we can just add it to kueue.

Otherwise, you can start a design that abstracts the different places where you could inject your own logic. I don't think the current contributors have bandwidth to work on this, but you are certainly welcome to do so. I know @denkensk and @kerthcet have been thinking along these lines.

@alculquicondor
Copy link
Contributor

@KunWuLuan @trasc maybe we should revisit this in the context of adding more queuing policies

@talcoh2x
Copy link

talcoh2x commented Jun 1, 2023

@alculquicondor hi, just want to say we already implement new CRD that do the Queue as we need and support kubeflow and Kubevirt operators.
it done in a way that good for us as this solution is internal. it can be nice to share the requirements so we will be able to work with kueue also at the end

@KunWuLuan
Copy link
Member

@KunWuLuan @trasc maybe we should revisit this in the context of adding more queuing policies

I think a framework is good for future work on queuing policies.

@alculquicondor
Copy link
Contributor

@talcoh2x would you be willing to present in a WG Batch meeting?

@KunWuLuan
Copy link
Member

If we decide to add a framework, maybe we can start with designing some extension points.
i have summarized a chart about the current Kueue's flowchart which may help, as the following:
image

@alculquicondor
Copy link
Contributor

The biggest recent change is that preemption calculation moved to flavorAssigment. This opens more possibilities for policies.

Perhaps something we can do is make FlavorAssigment an interface where each implementation is a policy. Then we strip down some of the existing code into library functions for building policies.

I don't currently see the possibility for a phased approach like in kube-scheduler Filters. But any ideas like this?

@KunWuLuan
Copy link
Member

I don't currently see the possibility for a phased approach like in kube-scheduler Filters. But any ideas like this?

@alculquicondor Maybe we can make queueSort policy as an interface? this can contain multi cluster queue sort and multi local queue sort for a single cluster queue.

@alculquicondor
Copy link
Contributor

Right, I would see that as a separate extension point. I agree that it could be useful.

ChristianZaccaria pushed a commit to ChristianZaccaria/kueue that referenced this issue Apr 17, 2024
ChristianZaccaria pushed a commit to ChristianZaccaria/kueue that referenced this issue Apr 17, 2024
ChristianZaccaria pushed a commit to ChristianZaccaria/kueue that referenced this issue Aug 22, 2024
ChristianZaccaria pushed a commit to ChristianZaccaria/kueue that referenced this issue Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/grand-feature lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants