-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Request] MXNet operator support #136
Comments
TuSimple have just written a draft version of MXNet operator. We'd like to contribute it to Kubeflow community as part of ecosystem and continuously improve it in the future. In addition to better deep learning lifecycle management, we plan to integrate the operator with kube-arbitrator for better job scheduling. Any process I have to follow up to make it happen? @jlewi BTW, TuSimple is one of the major contributors in MXNet community. |
@suleisl2000 that's a fantastic offer. Would you and TuSimple be willing to continue supporting the operator and making the changes (see below) to better integrate it into Kubeflow? Assuming we go forward integrating this into Kubeflow there's a couple things we should do
@suleisl2000 @johnugeorge is currently working on refactoring the PyTorchJob and TFJob operators so that they can share implementation; see for example kubeflow/training-operator#773. The current thinking is that we should have a different CRD for each type of job, but the underlying implementation should be shared. As part of this there is ongoing discussion about whether we should use separate repos or move all the @gaocegege @johnugeorge what do you think? Should we put the code in its own repository or should we add it to tf-operator to make it easier to start refactoring to use a shared implementation? |
@suleisl2000 Great to see this effort. Tf-operator is currently refactored to share the common code for future operators. Operators will have their own CRDs(at least for now) while sharing all common code. Currently, we have decided to use tf-operator as the central repo to hold all operators. We have planned to rename the repo in the future. I will raise initial PR for Pytorch operator(v1alpha2) in a week. You can use that as a reference. @jlewi |
@jlewi @johnugeorge Thanks for your reply. We are glad to integrate it into Kubeflow and we will take actions as following:
|
SGTM. I think if we do items 1 (ksonnet package) and 2 (docs) people can start using it and giving feedback. As long as we mark it "experimental" to indicate that is in flux and subject to change I don't see a strong need to block on resolution of i/how best to integrate into the shared implementation. |
@jlewi That's great. We are cleaning up the code, and will create a PR once it done. |
I have created PRs for ksonnect package and docs. However, I am not sure how to create PR for the operator, would you mind giving some guidance? @jlewi @johnugeorge |
What's the question about the operator; is it just about where to put the code? Should ew create a new repository for this? |
Yes, that's my question. Currently, we have the repository under https://github.com/TuSimple/mxnet-operator. I think it is better to move the code into https://github.com/kubeflow/mxnet-operator just like other operators. If it is the case, who have the permission to create the repository for it? @jlewi |
Yes. Sounds good @jlewi
…On Wed, Aug 22, 2018, 8:53 PM Lei Su ***@***.***> wrote:
Yes, that's my question. Currently, we have the repository under
https://github.com/TuSimple/mxnet-operator. I think it is better to move
the code into https://github.com/kubeflow/mxnet-operator just like other
operators. If it is the case, who have the permission to create the
repository for it? @jlewi <https://github.com/jlewi>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#136 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACEj_7bI9COlBYBWwMOJAQ8nGtzFKxLBks5uTXdcgaJpZM4UIZnF>
.
|
I think we could create a new repo. Repo transfer requires the ownership in Kubeflow and TuSimple org |
I think a new repo is better; just so there is a PR indicating CLA's been signed and code is contributed. |
Create a new repository for the mxnet operator and add @suleisl2000 as an owner since he will be working on it. Related to kubeflow/community#136
@suleisl2000 I have created the repo mxnet-operator. You'll need to do the following
Lets leave this issue until the above items are completed and the repo is fully setup. |
Really eager to try MXNet Operator in Kubeflow. Is it ready in master? @suleisl2000 |
There is a little bit merge effort to be done. I will try to finish it early next week. Thank you for your attention. @idibidiart |
* add prow setup config (kubeflow/community#136) * copy all files from tusimple/mxnet-operator to here * change to mxnet-operator
@suleisl2000 I already added the ci-bots; sorry for not making that clear. sent @jzp1025 a GitHub invite. |
Could we close the issue? I think the mxnet operator repository has been set up. |
Hi, Has it been merged with master? Or is it a separate repo? Also, is there a Readme for using MXNet with Kubeflow? |
Fantastic. Thank you to all who helped with this. |
I think we could close the issue since we have the repo for mxnet operator. |
* Add richardsliu to OWNERS in kubeflow/website * Test website versioning * Revert "Test website versioning" This reverts commit 67ea8da00360bf05aebdb667f608c36774ed822f. * Testing website versioning * Add richardsliu to OWNERS in kubeflow/website Test website versioning Revert "Test website versioning" This reverts commit 67ea8da00360bf05aebdb667f608c36774ed822f. Testing website versioning * Fix css * Fixing merge errors * Fix css * Fix css * Fix css * Fix css * Change master label to latest * Parameterize links in docs to point to the right version * Fix shortcode * Fix shortcode * Fix shortcodes * Fix more links * Fix some more links * Modify style changes in sass instead of css * Edit README.md * Rename latest to master
Now we support TensorFlow and pytorch well. MXNet is another popular ML framework and I think we should implement the operator for it to attract more DL practitioners.
We need to create a proposal like https://github.com/kubeflow/community/blob/master/proposals/pytorch-operator-proposal.md and create the repository for the operator.
References
/cc @brucechin
The text was updated successfully, but these errors were encountered: