[Request] MXNet operator support #136

gaocegege · 2018-05-22T10:42:30Z

Now we support TensorFlow and pytorch well. MXNet is another popular ML framework and I think we should implement the operator for it to attract more DL practitioners.

We need to create a proposal like https://github.com/kubeflow/community/blob/master/proposals/pytorch-operator-proposal.md and create the repository for the operator.

References

/cc @brucechin

suleisl2000 · 2018-08-13T08:07:36Z

TuSimple have just written a draft version of MXNet operator. We'd like to contribute it to Kubeflow community as part of ecosystem and continuously improve it in the future. In addition to better deep learning lifecycle management, we plan to integrate the operator with kube-arbitrator for better job scheduling. Any process I have to follow up to make it happen? @jlewi

BTW, TuSimple is one of the major contributors in MXNet community.

/cc @k82cn @jzp1025 @jjmtraveller

jlewi · 2018-08-13T12:59:03Z

@suleisl2000 that's a fantastic offer.

Would you and TuSimple be willing to continue supporting the operator and making the changes (see below) to better integrate it into Kubeflow?

Assuming we go forward integrating this into Kubeflow there's a couple things we should do

Add a ksonnet package see here
Add an mxnet guide to our website [here](https://github.com/kubeflow/website/tree/master/content/docs/guides/components
Figure out where the code should live
Figure out what changes are needed to provide an API consistent with other operators see v1alpha2 pytorch API should try to be consistent with TFJob pytorch-operator#49

@suleisl2000 @johnugeorge is currently working on refactoring the PyTorchJob and TFJob operators so that they can share implementation; see for example kubeflow/training-operator#773. The current thinking is that we should have a different CRD for each type of job, but the underlying implementation should be shared.

As part of this there is ongoing discussion about whether we should use separate repos or move all the
code into a single operator.

@gaocegege @johnugeorge what do you think? Should we put the code in its own repository or should we add it to tf-operator to make it easier to start refactoring to use a shared implementation?

johnugeorge · 2018-08-13T16:57:38Z

@suleisl2000 Great to see this effort. Tf-operator is currently refactored to share the common code for future operators. Operators will have their own CRDs(at least for now) while sharing all common code. Currently, we have decided to use tf-operator as the central repo to hold all operators. We have planned to rename the repo in the future. I will raise initial PR for Pytorch operator(v1alpha2) in a week. You can use that as a reference.

@jlewi
Since we already have a shared implementation now, I think we should use it. Else efforts will be duplicated. we can plan v1alpha2 version of https://github.com/TuSimple/mxnet-operator to use the shared implementation in tf-operator repo.

suleisl2000 · 2018-08-14T05:07:51Z

@jlewi @johnugeorge Thanks for your reply. We are glad to integrate it into Kubeflow and we will take actions as following:

Investigate v1alpha2 version to use shared implementation mentioned by @johnugeorge
Finish TODO items listed by @jlewi

/cc @jzp1025 @jjmtraveller

jlewi · 2018-08-16T06:33:27Z

SGTM. I think if we do items 1 (ksonnet package) and 2 (docs) people can start using it and giving feedback. As long as we mark it "experimental" to indicate that is in flux and subject to change I don't see a strong need to block on resolution of i/how best to integrate into the shared implementation.

suleisl2000 · 2018-08-20T09:32:34Z

@jlewi That's great. We are cleaning up the code, and will create a PR once it done.

suleisl2000 · 2018-08-21T11:12:54Z

I have created PRs for ksonnect package and docs. However, I am not sure how to create PR for the operator, would you mind giving some guidance? @jlewi @johnugeorge

jlewi · 2018-08-22T12:01:02Z

What's the question about the operator; is it just about where to put the code? Should ew create a new repository for this?

suleisl2000 · 2018-08-22T15:22:33Z

Yes, that's my question. Currently, we have the repository under https://github.com/TuSimple/mxnet-operator. I think it is better to move the code into https://github.com/kubeflow/mxnet-operator just like other operators. If it is the case, who have the permission to create the repository for it? @jlewi

johnugeorge · 2018-08-22T18:48:09Z

Yes. Sounds good @jlewi

…

On Wed, Aug 22, 2018, 8:53 PM Lei Su ***@***.***> wrote: Yes, that's my question. Currently, we have the repository under https://github.com/TuSimple/mxnet-operator. I think it is better to move the code into https://github.com/kubeflow/mxnet-operator just like other operators. If it is the case, who have the permission to create the repository for it? @jlewi <https://github.com/jlewi> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#136 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACEj_7bI9COlBYBWwMOJAQ8nGtzFKxLBks5uTXdcgaJpZM4UIZnF> .

gaocegege · 2018-08-23T04:55:28Z

I think we could create a new repo. Repo transfer requires the ownership in Kubeflow and TuSimple org

jlewi · 2018-08-23T14:00:53Z

I think a new repo is better; just so there is a PR indicating CLA's been signed and code is contributed.

@suleisl2000

Create a new repository for the mxnet operator and add @suleisl2000 as an owner since he will be working on it. Related to kubeflow/community#136

jlewi · 2018-08-23T14:12:50Z

@suleisl2000 I have created the repo mxnet-operator.
I also sent you an invite to the org please accept.

You'll need to do the following

Follow the instructions here to setup Prow
- https://github.com/kubeflow/community/blob/master/repository-setup.md#setting-up-prow-for-your-repository
Add other reviewers/approvers to the OWNERs file as necessary (they will need to request invites to the Kubeflow org for that to actually work)
Please submit a PR adding yourself to members.yaml
https://github.com/kubeflow/community/blob/master/members.yaml
so that we know how to reach you
- All the repo approvers should also be listed there; everyone should submit a PR for themselves so its clear we have their permission to include them in the repo.
You should be able to submit and approve PRs to the repo
- For more info https://github.com/kubeflow/community/blob/master/CONTRIBUTING.md

Lets leave this issue until the above items are completed and the repo is fully setup.

idibidiart · 2018-08-24T20:05:06Z

Really eager to try MXNet Operator in Kubeflow. Is it ready in master? @suleisl2000

suleisl2000 · 2018-08-25T04:41:36Z

There is a little bit merge effort to be done. I will try to finish it early next week. Thank you for your attention. @idibidiart

suleisl2000 · 2018-08-27T10:25:25Z

@jlewi Please help invite @jzp1025 into the org as mxnet-operator reviewer. BTW, looks like I don't have access to "Add the ci-bots team to the repository with write access" indicated in item 1.

* add prow setup config (kubeflow/community#136) * copy all files from tusimple/mxnet-operator to here * change to mxnet-operator

jlewi · 2018-09-03T21:59:43Z

@suleisl2000 I already added the ci-bots; sorry for not making that clear.

sent @jzp1025 a GitHub invite.

gaocegege · 2018-09-04T02:34:59Z

Could we close the issue? I think the mxnet operator repository has been set up.

idibidiart · 2018-09-04T03:15:11Z

Hi,

Has it been merged with master? Or is it a separate repo?

Also, is there a Readme for using MXNet with Kubeflow?

gaocegege · 2018-09-04T03:20:32Z

https://github.com/kubeflow/mxnet-operator

@idibidiart

idibidiart · 2018-09-04T03:21:54Z

Fantastic. Thank you to all who helped with this.

gaocegege · 2018-12-02T03:00:03Z

I think we could close the issue since we have the repo for mxnet operator.

* Add richardsliu to OWNERS in kubeflow/website * Test website versioning * Revert "Test website versioning" This reverts commit 67ea8da00360bf05aebdb667f608c36774ed822f. * Testing website versioning * Add richardsliu to OWNERS in kubeflow/website Test website versioning Revert "Test website versioning" This reverts commit 67ea8da00360bf05aebdb667f608c36774ed822f. Testing website versioning * Fix css * Fixing merge errors * Fix css * Fix css * Fix css * Fix css * Change master label to latest * Parameterize links in docs to point to the right version * Fix shortcode * Fix shortcode * Fix shortcodes * Fix more links * Fix some more links * Modify style changes in sass instead of css * Edit README.md * Rename latest to master

gaocegege added area/operator addition/feature labels May 22, 2018

gaocegege mentioned this issue May 22, 2018

propopsal: Add MXNet #137

Closed

jzp1025 mentioned this issue Aug 21, 2018

This is greate program. Do you consider to when support GPU? jzp1025/mxnet-operator.v2#1

Open

suleisl2000 added a commit to suleisl2000/kubeflow that referenced this issue Aug 21, 2018

Add mxnet operator (#kubeflow/community#136)

a9f6d77

suleisl2000 added a commit to suleisl2000/website that referenced this issue Aug 21, 2018

Add mxnet operator (kubeflow/community#136)

f4732d5

suleisl2000 added a commit to suleisl2000/kubeflow that referenced this issue Aug 21, 2018

Add mxnet operator (kubeflow/community#136)

b34a429

suleisl2000 added a commit to suleisl2000/kubeflow that referenced this issue Aug 22, 2018

Add mxnet operator (kubeflow/community#136)

4e0a0ce

suleisl2000 added a commit to suleisl2000/kubeflow that referenced this issue Aug 22, 2018

correct format (kubeflow/community#136)

40d3a42

suleisl2000 added a commit to suleisl2000/kubeflow that referenced this issue Aug 23, 2018

Add mxnet operator (kubeflow/community#136)

f43392e

jlewi added a commit to kubeflow/mxnet-operator that referenced this issue Aug 23, 2018

Create an owners file

32c0790

Create a new repository for the mxnet operator and add @suleisl2000 as an owner since he will be working on it. Related to kubeflow/community#136

suleisl2000 added a commit to suleisl2000/community that referenced this issue Aug 24, 2018

add member and organization for mxnet operator (kubeflow#136)

8131d3b

suleisl2000 added a commit to suleisl2000/test-infra that referenced this issue Aug 24, 2018

prow: add kubeflow/mxnet-operator repository (kubeflow/community#136)

8db3a30

suleisl2000 added a commit to suleisl2000/test-infra that referenced this issue Aug 24, 2018

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

29caf60

suleisl2000 mentioned this issue Aug 24, 2018

prow: Add kubeflow/mxnet-operator repository kubernetes/test-infra#9149

Merged

suleisl2000 added a commit to suleisl2000/test-infra that referenced this issue Aug 24, 2018

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

ad645c7

suleisl2000 added a commit to suleisl2000/test-infra that referenced this issue Aug 24, 2018

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

0ae421f

suleisl2000 added a commit to suleisl2000/website that referenced this issue Aug 24, 2018

Add mxnet operator (kubeflow/community#136)

d5358c4

suleisl2000 mentioned this issue Aug 24, 2018

add member and organization for mxnet operator #170

Merged

k8s-ci-robot pushed a commit to kubeflow/kubeflow that referenced this issue Aug 24, 2018

Add mxnet operator (kubeflow/community#136) (#1392)

d63cc2b

k8s-ci-robot pushed a commit that referenced this issue Aug 24, 2018

add member and organization for mxnet operator (#136) (#170)

b0ebedb

k8s-ci-robot pushed a commit to kubeflow/website that referenced this issue Aug 27, 2018

Add mxnet operator (kubeflow/community#136) (#172)

1df170a

k8s-ci-robot pushed a commit to kubeflow/mxnet-operator that referenced this issue Aug 29, 2018

MXNet operator initial version (#1)

8d73e16

* add prow setup config (kubeflow/community#136) * copy all files from tusimple/mxnet-operator to here * change to mxnet-operator

This was referenced Aug 29, 2018

Merge changes from test-infra upstream jetstack/test-infra#246

Merged

Move all prow job configs into config/ jetstack/test-infra#247

Merged

yutongz pushed a commit to yutongz/k8s-test-infra that referenced this issue Sep 12, 2018

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

60511c0

yutongz pushed a commit to yutongz/k8s-test-infra that referenced this issue Sep 12, 2018

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

a7de80d

gaocegege closed this as completed Dec 2, 2018

michelle192837 pushed a commit to michelle192837/testgrid that referenced this issue Aug 27, 2019

prow: Add kubeflow/mxnet-operator repository (kubeflow/community#136)

cf3734b

saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021

Add mxnet operator (kubeflow/community#136) (kubeflow#1392)

4df230c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] MXNet operator support #136

[Request] MXNet operator support #136

gaocegege commented May 22, 2018 •

edited

Loading

suleisl2000 commented Aug 13, 2018

jlewi commented Aug 13, 2018

johnugeorge commented Aug 13, 2018

suleisl2000 commented Aug 14, 2018

jlewi commented Aug 16, 2018

suleisl2000 commented Aug 20, 2018

suleisl2000 commented Aug 21, 2018

jlewi commented Aug 22, 2018

suleisl2000 commented Aug 22, 2018 •

edited

Loading

johnugeorge commented Aug 22, 2018 via email

gaocegege commented Aug 23, 2018

jlewi commented Aug 23, 2018

jlewi commented Aug 23, 2018

idibidiart commented Aug 24, 2018

suleisl2000 commented Aug 25, 2018

suleisl2000 commented Aug 27, 2018 •

edited

Loading

jlewi commented Sep 3, 2018

gaocegege commented Sep 4, 2018

idibidiart commented Sep 4, 2018

gaocegege commented Sep 4, 2018 •

edited

Loading

idibidiart commented Sep 4, 2018

gaocegege commented Dec 2, 2018

[Request] MXNet operator support #136

[Request] MXNet operator support #136

Comments

gaocegege commented May 22, 2018 • edited Loading

References

suleisl2000 commented Aug 13, 2018

jlewi commented Aug 13, 2018

johnugeorge commented Aug 13, 2018

suleisl2000 commented Aug 14, 2018

jlewi commented Aug 16, 2018

suleisl2000 commented Aug 20, 2018

suleisl2000 commented Aug 21, 2018

jlewi commented Aug 22, 2018

suleisl2000 commented Aug 22, 2018 • edited Loading

johnugeorge commented Aug 22, 2018 via email

gaocegege commented Aug 23, 2018

jlewi commented Aug 23, 2018

jlewi commented Aug 23, 2018

idibidiart commented Aug 24, 2018

suleisl2000 commented Aug 25, 2018

suleisl2000 commented Aug 27, 2018 • edited Loading

jlewi commented Sep 3, 2018

gaocegege commented Sep 4, 2018

idibidiart commented Sep 4, 2018

gaocegege commented Sep 4, 2018 • edited Loading

idibidiart commented Sep 4, 2018

gaocegege commented Dec 2, 2018

gaocegege commented May 22, 2018 •

edited

Loading

suleisl2000 commented Aug 22, 2018 •

edited

Loading

suleisl2000 commented Aug 27, 2018 •

edited

Loading

gaocegege commented Sep 4, 2018 •

edited

Loading