Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update MultiKueue to support Kubeflow Jobs ManagedBy #4116

Conversation

mszadkow
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Support Kubeflow Training-Operator Jobs managedBy feature for the MultiKueue.
Allow for installing training-operator in management cluster instead of only crds.

Which issue(s) this PR fixes:

Relates to #2552

Special notes for your reviewer:

Does this PR introduce a user-facing change?

MultiKueue: Add support for Kubeflow Training-Operator Jobs  `spec.runPolicy.managedBy` field

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 31, 2025
@mszadkow
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jan 31, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 31, 2025
Copy link

netlify bot commented Jan 31, 2025

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 70e4321
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67a09aff2d4d3b000802321d

@mszadkow mszadkow force-pushed the feature/update-kubeflow-jobs-to-managedby branch 2 times, most recently from 1497ed2 to 066d9b1 Compare January 31, 2025 16:11
@mszadkow
Copy link
Contributor Author

/retest

@mszadkow mszadkow force-pushed the feature/update-kubeflow-jobs-to-managedby branch from 066d9b1 to 9ac33d8 Compare February 3, 2025 08:29
@mszadkow
Copy link
Contributor Author

mszadkow commented Feb 3, 2025

/retest

@mszadkow mszadkow force-pushed the feature/update-kubeflow-jobs-to-managedby branch from 9ac33d8 to 3f54d20 Compare February 3, 2025 09:29
@mszadkow
Copy link
Contributor Author

mszadkow commented Feb 3, 2025

/retest

@mszadkow mszadkow force-pushed the feature/update-kubeflow-jobs-to-managedby branch from 3f54d20 to 70e4321 Compare February 3, 2025 10:31
@mszadkow mszadkow marked this pull request as ready for review February 3, 2025 10:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2025
@mbobrovskyi
Copy link
Contributor

/lgtm
Thanks!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f90c413c312ee765ea444ba1947c7364ac07935f

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
Great work! Looking forward to defaulting of the managedBy field for Kubeflow!

Comment on lines -91 to -92
# Modify the `newTag` for the `kubeflow/training-operator` to use the one training-operator version
$YQ eval '(.images[] | select(.name == "kubeflow/training-operator").newTag) = env(KUBEFLOW_IMAGE_VERSION)' -i "$KUBEFLOW_MANIFEST_MANAGER/kustomization.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup is great to see. Our users will benefit from that simplification of the setup too!

@@ -454,6 +454,7 @@ var _ = ginkgo.Describe("MultiKueue", func() {
ginkgo.It("Should run a kubeflow PyTorchJob on worker if admitted", func() {
// Since it requires 1600M of memory, this job can only be admitted in worker 2.
pyTorchJob := testingpytorchjob.MakePyTorchJob("pytorchjob1", managerNs.Name).
ManagedBy(kueue.MultiKueueControllerName).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed when we have the webhook to add the ManagedBy by default for Kubeflow, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes, you are right, the behaviour should be the same with empty field and this value.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 10, 2025
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thank you for your cross-project contribution!
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mimowo, mszadkow, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0b09d56 into kubernetes-sigs:main Feb 10, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.11 milestone Feb 10, 2025
@mszadkow mszadkow deleted the feature/update-kubeflow-jobs-to-managedby branch February 10, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants