-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt mpi-operator v2 to Kueue #505
Comments
I suppose you mean creating and syncing a Workload object for each MPIJob. Do you think we should have this code in mpi-operator or in kueue? |
Yes, I meant we implement a controller for Workload resources like the job-controller.
IIUC, kueue is designed not to hold third-party dependencies. |
Yes, we could have this controller in any of the repos. Wherever it goes, we should have them for kueue+mpi-operator. Both repos have the setup for E2E tests. Having it in kueue might be better for the time being as things are still changing. But having it in mpi-operator serves as proof that custom jobs don't have to be in tree (and we can add references from the kueue README). I personally prefer to have it in mpi-operator if the OWNERS don't mind (cc @terrytangyuan) @ahg-g, wdyt? |
I am okay with either approach as long as there are E2E tests. |
If we put it in mpi-operator repo, I suppose it will run as a reconciler within the same binary, not yet another operator, correct? |
Yes, same binary, possibly guarded by a command line flag to enable |
I'm also fine either way. If we select the latter, my concern is when kueue will stop serving that controller and donate that controller to the mpi-operator repo. If kueue keeps serving that controller, we might face why kueue does not provide other controllers (e.g., Ray, Argo, Spark, and more). However, if we select the former, the mpi-operator may face API changes of kueue since kueue has alpha status as you say. |
We have maintainers in both sides, so I think we can manage :) |
Great! |
Kueue would like to have consistent integration model. We are grateful that MPI-operator is willing to add some Kueue specific code, but other frameworks may not be that welcoming and prefer to keep outside the code that doesn't have to be in their repo. From that perspective, doing the non-critical integration (other than suspend logic) outside of frameworks sounds like a better option. |
It sounds reasonable. I agree, @mwielgus. |
We can close this issue and work on the kueue side. However, for a while, I would like to keep opening this issue to know what others think. |
/close |
@tenzen-y: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Many users hope mpi-operator v2 adapts Kueue.
Blocked by #504 kubernetes-sigs/kueue#369
Kueue side issue: kubernetes-sigs/kueue#65
/kind feature
The text was updated successfully, but these errors were encountered: