-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MPIJob controller #578
Conversation
Skipping CI for Draft Pull Request. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
As mentioned here: #65 (comment). For now, I'm doing a rough prototype by copy-paste to see what the obstacles are. I guess once it is working we could also abstract out the library based on the two implementations. This could also give us better clues of how the final interface should look like. I guess we need to sync up on this. |
2e73c42
to
e3f5c67
Compare
e3f5c67
to
bed51ad
Compare
bed51ad
to
7b46172
Compare
2f8b14b
to
c4db71a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the first feedback.
@@ -0,0 +1,27 @@ | |||
# permissions for end users to edit jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the long term, we may want to prepare a separate kustomization.yaml
for each integration in the following:
config/
├── components
├── default
├── default_with_mpijob
├── default_with_training-operator
├── default_with_rayjob
├── default_with_all_integration
...
// SetupWithManager sets up the controller with the Manager. It indexes workloads | ||
// based on the owning jobs. | ||
func (r *MPIJobReconciler) SetupWithManager(mgr ctrl.Manager) error { | ||
return ctrl.NewControllerManagedBy(mgr). | ||
For(&kubeflow.MPIJob{}). | ||
Owns(&kueue.Workload{}). | ||
Complete(r) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no CRDs for MPIJob in the cluster, the kueue-controller always says logs in the following, although kueue works fine:
{"level":"error","ts":"2023-03-01T16:01:40.4642038Z","logger":"controller-runtime.source","caller":"source/source.go:143","msg":"if kind is a CRD, it should be installed before calling Start","kind":"MPIJob.kubeflow.org","error":"no matches for kind \"MPIJob\" in version \"kubeflow.org/v2beta1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/source/source.go:143\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.1/pkg/util/wait/wait.go:235\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.1/pkg/util/wait/wait.go:662\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.1/pkg/util/wait/wait.go:596\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.26.1/pkg/util/wait/wait.go:547\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.2/pkg/source/source.go:136"}
Maybe we can skip to watch MPIJob using following library if there are no CRDs for MPIJob:
// If err is not nil, there are no CRDs for MPIJob in the cluster.
_, err := mgr.GetRESTMapper().RESTMapping(schema.GroupKind{Group: kubeflow.SchemeGroupVersion.Group, Kind: "MPIJob"})
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's an overkill for now, as we plan to move this code to a separate binary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's an overkill for now, as we plan to move this code to a separate binary
Does that mean we will run controllers for CustomJobs as sidecar containers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either sidecars or separate deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest this as a future improvement that may not be needed if we move the code out to a separate binary (probably next release though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use MPIJob
or mpijob
in messages and comments instead of Job
or job
?
2eec0fd
to
e78db60
Compare
I have applied the suggestions. I think this will be easier to maintain once we commonize the code, which is planned, based on the interface: https://github.com/kubernetes-sigs/kueue/tree/main/keps/369-job-interface. |
Ah, I see. sgtm |
e78db60
to
facd7e2
Compare
/assign @kerthcet |
b400105
to
015e75f
Compare
I'll take a look but maybe later(tomorrow I think), stuck with other things. Hope you don't mind. ^_^ |
@@ -519,7 +520,7 @@ func ConstructWorkloadFor(ctx context.Context, client client.Client, | |||
job *batchv1.Job, scheme *runtime.Scheme) (*kueue.Workload, error) { | |||
w := &kueue.Workload{ | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: job.Name, | |||
Name: GetWorkloadNameForJob(job.Name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a user facing change we should highlight. But seems we can't do as in Kubernetes currently. Just to notify that this may be involved in the changeLog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of, workloads are generally API objects that should be "technical details" to the users. But, it might be worth mentioning, I will ask to add the note when we prepare release notes for 0.3.0.
limitations under the License. | ||
*/ | ||
|
||
package jobframework |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename the folder workload
to job
for Workload is a specific object in Kueue, then the dir tree looks like below, but that can be a separate commit.
job/
├── batchjob
├── mpijob
├── framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, but this would make the diff under this PR unnecessarily big. If we decide on the folder rename this should be a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't have a strong opinion on the layout. Maybe it can be discussed further when we commonize the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the proposal by @kerthcet, for another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me finish the job library first, or I need to resolve the conflicts again.
|
||
// podsReady checks if all pods are ready or succeeded | ||
func podsReady(job *kubeflow.MPIJob) bool { | ||
for _, c := range job.Status.Conditions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like when MPIJob is running, then it's ready. Not quite familiar with MPIJob operator, what will happen before job running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the Running condition is set when all workers are running and the launcher: https://github.com/kubeflow/mpi-operator/blob/5946ef4157599a474ab82ff80e780d5c2546c9ee/pkg/controller/mpi_job_controller.go#L1041-L1045.
Before that, the MPIJobCreated is set, and MPIJobSuspended (if it is suspended).
@@ -0,0 +1,27 @@ | |||
# permissions for end users to edit jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchJob is an in-tree object, so I think it's ok to maintain the yaml here. And we can better organize it like we do in test/e2e/config
(I haven't tried it yet). But I don't want to block this significant feature.
82b8f32
to
4dc53e4
Compare
@kerthcet thank you for the review! I think I have addressed the remarks. Let me know if something more is needed. |
/lgtm |
4dc53e4
to
2d435c8
Compare
/lgtm |
Rebased and squashed. @alculquicondor please renew lgtm :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! We should add more docs and examples around this.
Thanks @mimowo that's a great start! |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #65
Special notes for your reviewer:
The follow ups: