-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TFJob(kubeflow) in Multikueue #2626
Support TFJob(kubeflow) in Multikueue #2626
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
51ef408
to
7845aa5
Compare
/ok-to-test |
fed964d
to
c8ce51a
Compare
c8ce51a
to
4c252f9
Compare
/retest |
03e2736
to
7848c26
Compare
/retest |
7848c26
to
3e314a8
Compare
/retest |
3e314a8
to
74aeec5
Compare
/retest |
74aeec5
to
702e214
Compare
/retest |
@mszadkow please ping us in a comment when the PR is ready for the second pass after the updates. |
83ea5c9
to
3b81bb6
Compare
/retest |
3b81bb6
to
63305cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits, otherwise LGTM
@@ -74,6 +74,16 @@ kubectl apply --server-side -f https://raw.githubusercontent.com/kubernetes-sigs | |||
``` | |||
{{% /alert %}} | |||
|
|||
### Kubeflow Installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a link to the kubeflow installation guide, to be done in the worker clusters.
63305cf
to
bd4e440
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor change suggestions.
/approve
@@ -68,6 +68,12 @@ The `managedBy` field is available as an Alpha feature staring Kubernetes 1.30.0 | |||
|
|||
We recommend using JobSet v0.5.1 or newer. | |||
|
|||
### Kubeflow | |||
|
|||
The supported version of the Kubeflow Training Operator is v1.7.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The supported version of the Kubeflow Training Operator is v1.7.0. | |
The supported version of the Kubeflow Training Operator is v1.7.0, or a newer version. |
### Kubeflow Installation | ||
|
||
Install Kubeflow Training-operator in the Worker cluster (see [Kubeflow Training-operator Installation](https://www.kubeflow.org/docs/components/training/installation/) | ||
for more details). Please use version v1.7.0 for MultiKueue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for more details). Please use version v1.7.0 for MultiKueue. | |
for more details). Please use version v1.7.0 or a newer version for MultiKueue. |
### Kubeflow Installation | ||
|
||
{{% alert title="Warning" color="warning" %}} | ||
Make sure to only install the Kubeflow TFJobs CRD of version v1.7.0 on the management cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to only install the Kubeflow TFJobs CRD of version v1.7.0 on the management cluster. | |
Make sure to install only the Kubeflow TFJobs CRD of version v1.7.0 or newer on the management cluster. |
Make sure to only install the Kubeflow TFJobs CRD of version v1.7.0 on the management cluster. | ||
|
||
```bash | ||
kubectl apply --server-side -f https://github.com/kubeflow/training-operator/blob/v1.7.0/manifests/base/crds/kubeflow.org_tfjobs.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl apply --server-side -f https://github.com/kubeflow/training-operator/blob/v1.7.0/manifests/base/crds/kubeflow.org_tfjobs.yaml | |
kubectl apply --server-side -f https://github.com/kubeflow/training-operator/blob/v1.8.0/manifests/base/crds/kubeflow.org_tfjobs.yaml |
Let's use the latest Kubeflow Training Operator version.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mszadkow, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: c69f1a8dde6f7c068ec57fb0f2838318a4fb45a7
|
/release-note-edit
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
The PR introduces a new Multikueue adapter to handle TFJob (Kubeflow).
We want to extend Multikueue capabilities to satisfy the needs of early adopters.
Which issue(s) this PR fixes:
Relates #2552
Special notes for your reviewer:
Does this PR introduce a user-facing change?