Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DominantResourceShare Compatible with Cohorts #4097

Merged

Conversation

gabesaba
Copy link
Contributor

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Prepare for #3759 by making DominantResourceShare compatible with Cohorts. This is done by refactoring dominantResourceShare to use the hierarchicalResourceNode interface, which Cohort and CohortSnapshot implement.

Additionally, we do some simplifications, deletions, moves:

  • remove multiplication by magic number in dominantResourceNode (was used to model subtracting resources)
  • delete resourceGroupNode interface (used only in tests)
  • delete netQuotaNode interface (not needed after refactor)
  • Move FairSharing code to its own file
  • Move ClusterQueueSnapshot's DominantResourceShare methods to clusterqueue_snapshot file

Special notes for your reviewer:

Please review commits independently for a more readable diff. The 2nd commit (should be) a pure move

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jan 29, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 29, 2025
Copy link

netlify bot commented Jan 29, 2025

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 91c2f48
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/679cb24933d21300088aa70a
😎 Deploy Preview https://deploy-preview-4097--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@gabesaba gabesaba force-pushed the dominant_resource_for_cohorts branch from 50fec32 to daf9f5f Compare January 29, 2025 16:12
@gabesaba gabesaba force-pushed the dominant_resource_for_cohorts branch from daf9f5f to f8cdd8e Compare January 30, 2025 10:22
@gabesaba
Copy link
Contributor Author

/assign @PBundyra

"sigs.k8s.io/kueue/pkg/resources"
)

type dominantResourceShareNode interface {
Copy link
Contributor

@PBundyra PBundyra Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a description of the interface and its methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -646,84 +638,8 @@ func workloadBelongsToLocalQueue(wl *kueue.Workload, q *kueue.LocalQueue) bool {
return wl.Namespace == q.Namespace && wl.Spec.QueueName == q.Name
}

// The methods below implement several interfaces. See
// dominantResourceShareNode, resourceGroupNode, and netQuotaNode.
// Implements dominantResourceShareNode interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clear advantage to this pattern? Even without this check, the code will fail to compile if we forget one of the methods, since we use clusterQueue as a dominantResourceShareNode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I had the impression that the code wouldn't fail during compilation, but rather during runtime. Thanks for clarifying

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think indeed it does not matter for compilation, but can you quickly test if adding these improves integration with vs-code?

IIRC adding these was quickly letting me know which functions are missing (not implemented), but don't recall details, so it might be not accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked: there is no difference in the error message when adding these, versus when trying to use the type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, being curious I also checked what happens in vs-code. When we have the check the error is here:
image
Otherwise it is also compile-time, but lands at usage
image

I think adding vs not adding the checks makes marginal difference and it is not something I would worry about either way.

return c.Name
}

// Implements dominantResourceShareNode interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 137 to 142
// DominantResourceShare returns a value from 0 to 1,000,000 representing the maximum of the ratios
// of usage above nominal quota to the lendable resources in the cohort, among all the resources
// provided by the ClusterQueue, and divided by the weight.
// If zero, it means that the usage of the ClusterQueue is below the nominal quota.
// The function also returns the resource name that yielded this value.
// Also for a weight of zero, this will return 9223372036854775807.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this description be moved to dominantResourceShare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@PBundyra
Copy link
Contributor

/retest

func (c *ClusterQueueSnapshot) usageFor(fr resources.FlavorResource) int64 {
return c.ResourceNode.Usage[fr]
}
// The methods below implement hierarchicalResourceNode interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@PBundyra
Copy link
Contributor

Since this is a clean up PR would you mind adding could you add description to hierarchicalResourceNode and ResourceNode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please follow the standard test pattern here, where we create separate type for test scenarios, and then run them in loop, so it's easily extendable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add this later, if we decide to extend the test. I don't think that it makes sense to make it extensible now, when there may only ever be one test (YAGNI)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, do you anticipate we won't need more tests despite implementing FairSharing for Hierarchical Cohorts?

@gabesaba gabesaba force-pushed the dominant_resource_for_cohorts branch from f8cdd8e to 62edcbc Compare January 31, 2025 11:20
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 31, 2025
@gabesaba gabesaba force-pushed the dominant_resource_for_cohorts branch from 62edcbc to 91c2f48 Compare January 31, 2025 11:21
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 31, 2025
@gabesaba
Copy link
Contributor Author

// provided by the ClusterQueue, and divided by the weight.
// If zero, it means that the usage of the ClusterQueue is below the nominal quota.
// The function also returns the resource name that yielded this value.
// Also for a weight of zero, this will return 9223372036854775807.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Also for a weight of zero, this will return 9223372036854775807.
// Also for a weight of zero, this will return maxInt.

@PBundyra
Copy link
Contributor

/lgtm
cc @mimowo

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 31, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9ffdbe429220e20fb0c5abad717939a3936a5716

@mimowo
Copy link
Contributor

mimowo commented Jan 31, 2025

/approve
Looks like a great cleanup PR! Feel free to address the remaining comments (#4097 and #4097) in a follow up. None of them looks blocking.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gabesaba, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2025
@k8s-ci-robot k8s-ci-robot merged commit 0236a85 into kubernetes-sigs:main Jan 31, 2025
17 of 18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Jan 31, 2025
@gabesaba gabesaba deleted the dominant_resource_for_cohorts branch January 31, 2025 14:44
FillZpp pushed a commit to leptonai/kueue that referenced this pull request Feb 5, 2025
)

* Update DominantResourceShare to work with Cohorts

* Move DominantResourceShare methods to more appropriate files

* Cleanup resourceGroupNode, used only in tests

* Update documentation of several types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants