-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Preemption to interact with Quota/Usage through ClusterQueueSnapshot interface #2595
Refactor Preemption to interact with Quota/Usage through ClusterQueueSnapshot interface #2595
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
} | ||
cqResUsage := cq.Usage[fName] | ||
for rName := range flvReq { | ||
if cqResUsage[rName] >= cq.QuotaFor(resources.FlavorResource{Flavor: fName, Resource: rName}).Nominal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: this >=
turns into a >
. I think this is correct (as we want to make sure we're within nominal quota), but please scrutinize it @alculquicondor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a more accurate check would be whether the usage plus the resources for the incoming workload would be borrowing.
If it's not borrowing, then this means that this CQ is preempting to reclaim quota, then it is allowed to preempt other workloads in the cohort. Otherwise, it should only be allowed to preempt workloads within its CQ or those below the threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a bug in the accounting, let's fix it in a follow-up PR, as this PR is intended to cleanup existing code without any change in behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: this >= turns into a >. I think this is correct (as we want to make sure we're within nominal quota), but please scrutinize it @alculquicondor
I think !queueUnderNominalInResourcesNeedingPreemption <> cqIsBorrowing
. For example if you have one resource, and cqResUsage[rName] == Nominal
then the answer is "false", so !queueUnderNominalInResourcesNeedingPreemption is true
, while cqIsBorrowing
is false.
ack, I remember this was quite complex, so I would prefer to keep the logic in this PR, and do a dedicated one for fix if needed. I will yet look a bit into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folded the two functions into one #2595 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've met with @gabesaba and agreed that he will try to adjust this refactoring PR not to change any logic.We will have a dedicated follow up to adjust the logic if needed. We will then consider changing <, with <=, or the idea from #2595 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Unfolded the functions, and added back in the cohort == nil check.
if cq.Cohort == nil { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get rid of this check? if so, can fold queueUnderNominalInResourcesNeedingPreemption
into this function. unit tests still pass after its removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the check can be removed, because this logic is only called for CQs that belong to the same cohort as the preempting one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced usages of queueUnderNominalInResourcesNeedingPreemption
with !cqIsBorrowing
pkg/cache/clusterqueue_snapshot.go
Outdated
return 0 | ||
} | ||
|
||
func (c *ClusterQueueSnapshot) borrowing(fr resources.FlavorResource) *int64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (c *ClusterQueueSnapshot) borrowing(fr resources.FlavorResource) *int64 { | |
func (c *ClusterQueueSnapshot) borrowingLimit(fr resources.FlavorResource) *int64 { |
pkg/cache/clusterqueue_snapshot.go
Outdated
|
||
// if the borrowing limit exists, we cap our available capacity by the borrowing limit. | ||
if borrowingLimit := c.borrowing(fr); borrowingLimit != nil { | ||
borrowingRemaining := c.nominal(fr) + *borrowingLimit - c.usageFor(fr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borrowingRemaining := c.nominal(fr) + *borrowingLimit - c.usageFor(fr) | |
withBorrowingRemaining := c.nominal(fr) + *borrowingLimit - c.usageFor(fr) |
if cq.Cohort == nil { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the check can be removed, because this logic is only called for CQs that belong to the same cohort as the preempting one.
} | ||
cqResUsage := cq.Usage[fName] | ||
for rName := range flvReq { | ||
if cqResUsage[rName] >= cq.QuotaFor(resources.FlavorResource{Flavor: fName, Resource: rName}).Nominal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a more accurate check would be whether the usage plus the resources for the incoming workload would be borrowing.
If it's not borrowing, then this means that this CQ is preempting to reclaim quota, then it is allowed to preempt other workloads in the cohort. Otherwise, it should only be allowed to preempt workloads within its CQ or those below the threshold.
@@ -91,6 +91,23 @@ func TestPreemption(t *testing.T) { | |||
Resource(corev1.ResourceCPU, "6", "6"). | |||
Resource(corev1.ResourceMemory, "3Gi", "3Gi"). | |||
Obj(), | |||
*utiltesting.MakeFlavorQuotas("alpha"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, change the test to use default
for both resources. They are part of the same resource group, so they shouldn't get different flavors.
d4ed6f5
to
a89149b
Compare
a89149b
to
3541f14
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gabesaba, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
LGTM label has been added. Git tree hash: 5d12594ef18c940ed7cbc15cda6969d616343427
|
…Snapshot interface (kubernetes-sigs#2595) * Fix preemption test * Refactor preemption.go
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Follow up to #2592. In preparation for #79.
Copied from #2592: we will change the Scheduler, FlavorAssigner, and Preemption logic to only interact with the ClusterQueueSnapshot's capacity through high level queries. E.g. "am I borrowing" or "how much capacity do I have left".
Additionally, we fix the test
preempting locally and borrowing other resources in cohort, without cohort candidates
. Since we were only looping over CQ resource groups, we didn'tfind out that we had no capacity in CQ for this flavor code. We fix this by adding the alpha flavor to that CQ, and adding a CQ which lends resources to those FlavorResources
Does this PR introduce a user-facing change?