Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enhancement for Workload preemption #410

Merged
merged 1 commit into from
Nov 28, 2022

Conversation

alculquicondor
Copy link
Contributor

@alculquicondor alculquicondor commented Oct 3, 2022

What type of PR is this?

/kind documentation

What this PR does / why we need it:

So far, the changes describe the goals, non-goals, user stories and API.

Which issue(s) this PR fixes:

Part of #83

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/documentation Categorizes issue or PR as related to documentation. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 3, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 3, 2022
@alculquicondor alculquicondor force-pushed the kep-preemption branch 3 times, most recently from 6e1c5c5 to 0b0c5f0 Compare October 5, 2022 13:26
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 7, 2022
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 14, 2022
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
be arbitrarily delayed.
- While Pods terminate, a ClusterQueue's quota could be oversubscribed.

Oversubscription could be mitigated by watching the running pods but this will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we consider add a new condition type to batch job, like reclaimed, which indicates that all the pods owned by job is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do something like that, but I would leave it as a follow up. It's kind of related to #78

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2022
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
Sorting uses the following criteria:

1. Flavors that don't borrow first.
2. [New criteria] Highest priority first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If some workloads have high priority but need to borrow from others or preempt other low priority workload in the same cluster queue , will them need to wait all the workloads that don't need to borrow to be scheduled firstly? Will it lead to long-term starvation of high-priority workloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, first we prefer not to borrow (it's the semantics of min quota).

If other ClusterQueues are not using all their quota, this high priority workload would be allowed to borrow.

Otherwise, it will start preempting lower priority workloads in its CQ, before allowing borrowing in the cohort.

So no, there wouldn't be starvation within the ClusterQueue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it will lead to performance loss. Low priority workloads(no need to borrow) will be scheduled first, then high priority will preempt it for clusterQueue is lack of resources. At the last, low priority workloads will be scheduled again with resource borrowing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the high priority workload fits with borrowing, it can still borrow, but in the next scheduling cycle.

If there is no capacity to borrow, it will preempt low priority workloads, and they wouldn't fit when requeued either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no, there wouldn't be starvation within the ClusterQueue.

If we have several lower priority workloads which don't borrow resources and one hight priority workloads which needs to borrow resources. In my understanding, the hight priority workloads is starving, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scheduling, we prefer the workloads wouldn't borrow resources, but we support preemption via priority, aren't they conflict with each other, that means lower workloads doesn't borrow resources might be scheduled at first, but then they will be preempted for higher priority ones?

How about only care about the priority in preemption, at least for startup. And leave the flavor assignment to #312, like MinimizeBorrowing. I think this is more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not exactly the same as your design, but can somehow mitigate this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, first we prefer not to borrow (it's the semantics of min quota).

I assume that there are two workloads:

  1. The workload_a has higher priority, but it don't have enough free quota in its cluster queue, so it need to preempt some lower workloads to get resource.
  2. The workload_b has lower priority, but it has enough free quota in its cluster queue, so it won't need to borrow some quota.

@alculquicondor I think maybe in you design: workload_b will be scheduled firstly, right? If there are so many short-term workloads like workload_b, the workload_a will wait all the workloads like workload_b scheduled? right? Here may lead to some starvation that higher priority workload will wait for a long time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have several lower priority workloads which don't borrow resources and one hight priority workloads which needs to borrow resources. In my understanding, the hight priority workloads is starving, no?

This is somewhat true. What you might be missing is that we are talking about heads of the ClusterQueues here.
So we have two heads of different ClusterQueues competing, but both have the highest priority within their ClusterQueue. Let's say A has lower priority than B. If B fits without borrowing, it should still be scheduled first, to satisfy the min quota of the ClusterQueue. That's the definition of min.

we prefer the workloads wouldn't borrow resources, but we support preemption via priority, aren't they conflict with each other

No. Preemption happens in two scenarios: (1) within a cohort, to recover the min quota that was borrowed to other ClusterQueues or (2) within a ClusterQueue, purely due to priority.

I think maybe in you design: workload_b will be scheduled firstly, right? If there are so many short-term workloads like workload_b, the workload_a will wait all the workloads like workload_b scheduled? right? Here may lead to some starvation that higher priority workload will wait for a long time?

Yes, this is working as intended. We should respect the min quota of the ClusterQueue for B first.
But workload_a could still preempt lower priority workloads within its ClusterQueue.

If the workload has `.spec.onPreemption=Requeue`, clear `.spec.admission`.
The Workload will be requeued by the Workload event handler.

The incoming Workload will be admitted once the changes in the victim Workloads
Copy link
Member

@denkensk denkensk Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time between change the workload and the cache updated, where the incoming workload is placed? will it re-queue again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be in the inadmissible stage. Every time a Workload deletion is observed, it will be in the active queue again.

I rewrote this paragraph for clarity.

keps/83-workload-preemption/README.md Show resolved Hide resolved
// since the first time that the ClusterQueue has pending workloads that would
// fit under the min quota. This time is present in the timestamp for the
// PendingWorkloadsUnderMinQuota condition.
// If null, it means to wait indefinitely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand correctly, shouldn't there be no wait here if null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that can be achieved with an explicit 0.

null basically means disable preemption (I updated the comment). An additional nice thing about this semantics is that it's backwards compatible

be arbitrarily delayed.
- While Pods terminate, a ClusterQueue's quota could be oversubscribed.

Oversubscription could be mitigated by watching the running pods but this will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
Sorting uses the following criteria:

1. Flavors that don't borrow first.
2. [New criteria] Highest priority first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it will lead to performance loss. Low priority workloads(no need to borrow) will be scheduled first, then high priority will preempt it for clusterQueue is lack of resources. At the last, low priority workloads will be scheduled again with resource borrowing.

keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
The algorithm is like follows:

For each resource (or set of resources with the same flavors), evaluate
flavors in the order established in the ClusterQueue:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #312

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked in the paragraph below for readers convenience.

Workload preemption:
- A field in the ClusterQueue to influence how long to wait before starting
preemption.
- A field in the Workload to determine whether to terminate or requeue upon preemption.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the usecases in which you think termination will be useful ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I don't think this is necessary, at least not initially, we can start with always re-queueing the jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my question to the community.

One case where I'm almost certain that it would be useful is for interactive Jobs that @kannon92 is working on. But since this work is still going on, we can leave it as a follow up and just requeue for now.

Moved to Alternatives

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like a separation between workloads that went over their allocated timelimit and workloads that are over their timelimit but allowed to run due to preemption.

I don't think it needs to be addressed here but that is why I created #405 to maybe discuss separately from preemption.

Copy link
Contributor Author

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added some notes about reassigning flavors. And opened kubernetes/kubernetes#113221

keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
workloads with higher priority and that Workloads are sent to ClusterQueues
where most Workloads have the same priority.

Additionally, the preemption is mostly a linear pass over the running workloads
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say 'the preemption algorithm". Updated.

keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
workloads when:
- ClusterQueues under their minimum quota need the resources that are currently
lent to other ClusterQueues in the cohort.
- Within a Clusterqueue, there are running Workloads with lower priority than
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Within a Clusterqueue, there are running Workloads with lower priority than
- Within a ClusterQueue, there are running Workloads with lower priority than

keps/83-workload-preemption/README.md Show resolved Hide resolved
Sorting uses the following criteria:

1. Flavors that don't borrow first.
2. [New criteria] Highest priority first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, first we prefer not to borrow (it's the semantics of min quota).

I assume that there are two workloads:

  1. The workload_a has higher priority, but it don't have enough free quota in its cluster queue, so it need to preempt some lower workloads to get resource.
  2. The workload_b has lower priority, but it has enough free quota in its cluster queue, so it won't need to borrow some quota.

@alculquicondor I think maybe in you design: workload_b will be scheduled firstly, right? If there are so many short-term workloads like workload_b, the workload_a will wait all the workloads like workload_b scheduled? right? Here may lead to some starvation that higher priority workload will wait for a long time?

keps/83-workload-preemption/README.md Show resolved Hide resolved
@alculquicondor alculquicondor changed the title WIP Add enhancement for Workload preemption Add enhancement for Workload preemption Nov 17, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2022
@alculquicondor
Copy link
Contributor Author

/retest
(the flaky integration test was fixed)

keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved

If this is a ClusterQueue with BestEffortFIFO, the Workload is put in the
inadmissible staging area with a deadline equal to the remaining time to
fullfil `waitBeforePreemptionSeconds`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no existing workloads can be preempted to fit the incoming workload, then at the limit all workloads that belong to this CQ will always be in the active queue (pretty much converting it into a strict fifo).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is if the deadline is not satisfied. Once it's satisfied, we proceed with preemption and requeue as usual as already explained at the end of this section. Still I tried to rephrase a little.

keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Show resolved Hide resolved
@alculquicondor
Copy link
Contributor Author

alculquicondor commented Nov 22, 2022 via email

// ClusterQueues, can preempt Workloads with higher priority in this
// ClusterQueue if it's using more than its min quota.
// Possible values are Allow and Disallow (default).
ByLowerPriorityWhenBorrowing *PreemptionByLowerPriorityWhenBorrowingPolicy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange name, any precedence in k8s to this naming format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.

Maybe something like this?

preemption:
  onLowPriorityReclaim: Preempt|Ignore

Copy link

@liggitt liggitt Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aldo and I spent some time talking about the behavior and the name of this field.

Which of the following is this field controlling:

  1. the policy this queue will use to preempt workloads running in other queues to recover quota borrowed from this queue
  2. the policy other queues will use to preempt workloads running in this queue to recover quota this queue borrowed from other queues

Option 1 was what I expected, since different queues in a cohort could have different opinions about how important it is to recover quota borrowed from them, and option 2 can't express that.

If this is for option 1, it seems like there are at least three possible preemption policies for recovering/reclaiming quota borrowed from this queue (having the behavior you get with a nil value for the policy be able to be represented as an explicit enum value is good):

  • Never
  • LowerPriority
  • Always

It's also unclear how this interacts with the WaitBeforePreemptionSeconds field. Can a queue be configured to reclaim borrowed quota by preemption (either immediately or by priority) without being forced to enable preemption among workloads in the queue?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this?

preemption:
  onLowPriorityReclaim: Preempt|Ignore

The "LowPriority" aspect seems like one of the possible policies, not something we should bake into the field name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed option 1; I like the enum and we can perhaps name the field preemptionPolicy or just policy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to:

preepmtion:
  withinCohort: Never|FromLowerPriority|Always
  withinClusterQueue: Never|FromLowerPriority

// in the transitionTimestamp of the Admitted condition.
// If null, it means to wait indefinitely, effectively disabling preemption
// for incoming Workloads in this ClusterQueue.
WaitBeforePreemptionSeconds *int64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jordan and I spend some time thinking about this field, beyond just the name.

Here is one scenario where things get confusing:

Setup:

  1. low priority stuff is running and consuming all resources, mode=StrictFIFO, WaitBeforePreemptionSeconds=2 seconds
  2. add medium priority workload b at t=0
  3. add high priority workload a at t=1
  4. one second passes (t=2)
  5. queue looks like high(workload-a(waiting 1 second)), medium(workload-b(waiting 2 seconds))

What should happen?

In best-effort mode:

  • we would look at workload-a, skip preemption (because it hadn't been waiting long enough)
  • continue to workload-b, trigger preemption of things matching b's resource requirements
  • change in available resources (because of our preemption) would make us reevaluate the whole list
  • if workload-a fit within the space made for workload-b by preemption, we would run it, otherwise, we'd run workload-b

In strict mode (currently):

  • we would only look at workload-a, and not preempt anything because it had not been waiting long enough
  • this means even though workload-b has been waiting longer than WaitBeforePreemptionSeconds, nothing triggers the queue to preempt to make progress until workload-a reaches that threshold

Naively, a user would expect once something in the queue had reached the wait period to consider preemption, we would start making room for the queue to progress to running that workload, even if that transitively meant making room for the workload at HEAD in strict mode.

I think we can make that change relatively easily by not just looking at the wait time for the head of the queue, but the longest wait time we have in the queue.

As for the field name, triggerAfterWorkloadWaitingSeconds should do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having second thoughts on this parameter. Without informing the victim job of the upcoming preemption, how is shifting the time we preempt going to be useful at the limit?

Copy link
Contributor Author

@alculquicondor alculquicondor Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with the interpretation of "at the limit" as when all the ClusterQueues are full, thus we would require preemption in all scenarios.

The time is measured from the first time we attempt to admit a Workload.
In StrictFIFO (assuming no higher priority workloads jump to the head of the queue), this means that we still wait before preemption after each admission.
BestEffortFIFO is more complicated, because the head will keep changing as long as there are no other changes in the cluster (like workloads finishing). In an extreme case, we would go through the entire queue and start the timers for all the workloads. Once the timer is fulfilled, we will do preemptions for all the elements in the queue until they are all admitted. Arguably, this is less useful.

I guess we could park it and instead provide a knob to control whether to preempt within a ClusterQueue or not, alongside with the knob to preempt within the cohort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can park it for now. It feels like in most cases this will simply shift the start of the incoming job, which is equivalent to as if the incoming job was submitted WaitBeforePreemptionSeconds later in most cases.

To do "cooperative" preemption, it is more useful to find ways to improve victim selection, probably via an explicit signal from the workloads (similar to preemption cost). For example, in the case of v1.Job and completions > parallelism, we can calculate how much progress it did by looking at success count and completions and prefer less the ones that are almost done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the field into the "Alternatives" section for now.

@alculquicondor alculquicondor force-pushed the kep-preemption branch 3 times, most recently from b2ec4c6 to e6801eb Compare November 23, 2022 21:02
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
keps/83-workload-preemption/README.md Outdated Show resolved Hide resolved
Some highlights:
- A Workload could get flavor assignments at different steps for different
resources.
- Assigments that require preemption implicitly do not borrow quota.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to support the case where a higher priority workload can preempt lower priority ones that are coming from any CQ with usage above its min. We can leave that as a followup, but I think we should make sure that the initial implementation doesn't block this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to support the case where a higher priority workload can preempt lower priority ones that are coming from any CQ with usage above its min.

Isn't that what we are already proposing? Or do you mean that the CQ of the pending workload would go above its min by preempting other workloads on CQs that aren't above their min?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement "Assigments that require preemption implicitly do not borrow quota." implies that preemption is allowed only if the pending workload can fit under its CQ quota after preemption.

Consider the case where all admitted workloads in CQ1 are of equal or higher priority than the pending workload, and there are lower priority workloads from CQ2 that are admitted via borrowed quota from CQ3, then I am suggesting that the pending workload from CQ1 should preempt the lower priority ones from CQ2 to release the unused resources of CQ3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha....

I think the implementation in this proposal can still be extended to support that:
we query what is the minimum priority among the admitted ClusterQueues in the flavor and see if it's higher than or equal to the pending Workload, then choose any flavor, but mark it as borrowing. Then run the preemption algorithm considering all workloads in the cohort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, lets document this case then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, to accommodate for that possible expansion, I renamed the enums a little, to include the fact that we are limiting preemption to situations of "reclaiming quota". Please see update.

@ahg-g
Copy link
Contributor

ahg-g commented Nov 28, 2022

Looks good to me, I think this is ready to merge, please squash.

@alculquicondor
Copy link
Contributor Author

squashed

@alculquicondor
Copy link
Contributor Author

/retest

2 similar comments
@alculquicondor
Copy link
Contributor Author

/retest

@alculquicondor
Copy link
Contributor Author

/retest

@ahg-g
Copy link
Contributor

ahg-g commented Nov 28, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2022
@k8s-ci-robot k8s-ci-robot merged commit 347abbd into kubernetes-sigs:main Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants