-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: Support for peremption based on flavor order #810
KEP: Support for peremption based on flavor order #810
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Hi @KunWuLuan. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/release-note-none |
cc @trasc for comment |
|
||
We will first try `default-flavor1` for cpu and memory resources. If `default-flavor1` doesn't fit, we try preempt in `default-flavor1`. If preemptor return preempt fail, we try to allocate resource in `default-flavor2`. | ||
|
||
We will pass preemptor to flavorAssigner and let flavorAssigner try to call public interface of preemptor to get candidate for certain podSet in current flavor if podSet can not find enough resources in current flavor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the preemptor know which flavors are wanted for each call?
Just as a note: currently , the preemptor depends on flavorassigner , not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently preemptor will find resource which need preemption in Do func:
kueue/pkg/scheduler/preemption/preemption.go
Lines 77 to 81 in e2cb8de
func (p *Preemptor) Do(ctx context.Context, wl workload.Info, assignment flavorassigner.Assignment, snapshot *cache.Snapshot) (int, error) { | |
log := ctrl.LoggerFrom(ctx) | |
resPerFlv := resourcesRequiringPreemption(assignment) | |
cq := snapshot.ClusterQueues[wl.ClusterQueue] |
resPerFlv contains flavor info and resource name, we can modify preemptor to allow preemptor accept resPerFlv as argument directly. Then preemptor will know the flavor.
we can can achieve this by moving resourcesRequiringPreemption
and totalRequestsForAssignment
to outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update based on #806
|
||
We will pass preemptor to flavorAssigner and let flavorAssigner try to call public interface of preemptor to get candidate for certain podSet in current flavor if podSet can not find enough resources in current flavor. | ||
|
||
### Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have some confidence over the API and algorithm design, please fill this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll do that
gpu: 1 | ||
``` | ||
|
||
We will first try `default-flavor1` for cpu and memory resources. If `default-flavor1` doesn't fit, we try preempt in `default-flavor1`. If preemptor return preempt fail, we try to allocate resource in `default-flavor2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
default-flavor1
doesn't fit, we try preempt indefault-flavor1
.
In this situation, what happens if we can preempt default-flavor
but it only frees up resources for 2 pods of more high-priority podSets?
Do we preempt default-flavor
only If we can free up resources for all pods? Or, do we preempt default-flavor
even if we can free up resources for part of the pods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we only allow one kind of resource use one flavor (because now there is only one flavor assignment for one kind of resource).
so I think in this pr we only preempt default-flavor only If we can free up resources for all pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because now there is only one flavor assignment for one kind of resource
You're right. We probably should take this situation as another feature.
@alculquicondor @kerthcet Do you think creating an issue for this feature (AssignMultipleFlavorsToWorkload?) is worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the preemption logic should not change.
With #806, we can check whether preemption could find enough candidates.
If it's not possible, then we go to the next flavor.
Do you think creating an issue for this feature (AssignMultipleFlavorsToWorkload?) is worth it?
I don't think we should do this. In most scenarios, this could cause pods to land in very different places, with potentially higher network latency. Also, we can't enforce it, as we wouldn't be able to set node selectors for a subset of pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this. In most scenarios, this could cause pods to land in very different places, with potentially higher network latency. Also, we can't enforce it, as we wouldn't be able to set node selectors for a subset of pods.
That makes sense.
@KunWuLuan would you mind adding an extension interface proposal for this KEP? |
} | ||
``` | ||
|
||
Length of FlavorFungibility should be either 0 or 3, this means we don't allow users to omit any of these three actions. If FlavorFungibility is empty, we set FlavorFungibility as `Borrow,TryNextFlavor,Preempt`. Actions after `TryNextFlavor` will be executed after all flavors are considered, and actions before `TryNextFlavor` will be executed before next flavor be considered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea that all the actions should be listed. However, I'm not aware of any k8s API that looks like this.
I started a thread on slack https://kubernetes.slack.com/archives/CJUQN3E4T/p1686161408595329
Are you sure all permutations are valid? For example, if we try to Borrow before Preempting, there could be priority inversion, similar to this: #763 (comment)
Somehow TryNextFlavor
is special. Should it be something like the following?
beforeTryingNextFlavor: ["Preempt", "Borrow"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we will borrow unused quota from other cluster queues before we try preempting.
Do you mean we will change this behavior?
If we can say that preemption always takes precedence over borrowing, I think plan A is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were initially targeting the release of 0.4 by the end of June. I think we might be too late for a feature like this for which we don't have a clear API yet.
We could still make progress and target 0.5 for this feature. WDYT?
Do you mean
No problem, I think more discussion lead to a better design. |
2ed8aa0
to
25c3c05
Compare
cc @mwielgus |
instances. If high priority jobs can preempt jobs in standard instances before trying spot instances, | ||
stability can be achieved. | ||
|
||
My use case can be supported by setting `flavorFungibility` to `BeforeNextFlavor` in the Kueue configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence looks outdated. The current proposal is doing it in the ClusterQueue spec, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still outdated
} | ||
``` | ||
|
||
If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still outdated
``` | ||
|
||
If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what's the default policy, which should match the current behavior.
IIUC, the current behavior is:
whenCanBorrow: Borrow
whenCanPreempt: TryNextFlavor
|
||
If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior. | ||
|
||
### Behavior Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics with respect to Partial Admission?
} | ||
``` | ||
|
||
We will store the scheduling context in workload info so that we can start from where we stop in previous scheduling attempts. This will be useful to avoid to waste time in one flavor all the time if we try to preempt in a flavor and failed. Scheduling context will contain the `LastScheduledFlavorIdx`, `ResourceFlavors` attached to the CQ, `ClusterQueueUsage` of the CQ and `CohortUsage`. Any changes to these properties will lead to a scheduling from the first flavor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some implementation detail: if the flavors in the CQ change, we should wipe the stored context. Maybe we can keep a hash to detect the changes.
### Implementation | ||
|
||
``` | ||
func assignFlavors(log logr.Logger, requests []workload.PodSetResources, podSets []kueue.PodSet, resourceFlavors map[kueue.ResourceFlavorReference]*kueue.ResourceFlavor, cq *cache.ClusterQueue, lastSchedule *workload.LastScheduleClusterQueueState) Assignment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the code here.
work in #849 |
We still should have the accompanying KEP merged. Could you have it up-to-date? |
Sorry I had a misoperation. I have updated the kep. |
Can you run |
instances. If high priority jobs can preempt jobs in standard instances before trying spot instances, | ||
stability can be achieved. | ||
|
||
My use case can be supported by setting `flavorFungibility` to `BeforeNextFlavor` in the Kueue configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still outdated
} | ||
``` | ||
|
||
If flavorFungibility is nil in configuration, we will set the `WhenCanBorrow` to `true` and set `WhenCanPreempt` to `false` to maintain consistency with the current behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still outdated
32a23b4
to
a45ffdf
Compare
Readme is updated already. All comments were resolved. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, KunWuLuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label tide/merge-method-squash |
Have run |
/lgtm |
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #582
Special notes for your reviewer:
Does this PR introduce a user-facing change?