Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Discussion] 1.5.0 Roadmap #14619

Closed
szha opened this issue Apr 4, 2019 · 32 comments
Closed

[Discussion] 1.5.0 Roadmap #14619

szha opened this issue Apr 4, 2019 · 32 comments

Comments

@szha
Copy link
Member

szha commented Apr 4, 2019

Let's start a discussion here about the roadmap towards 1.5.0. We are looking for:

  • New features that are useful to your research and development.
  • Improvements and patches to existing features.

If you have any item that you'd like to propose to have in the roadmap, please do:

  • Create (or locate existing) issue/pull request for the item, note the issue/pull request number.
  • Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you.
  • Indicate whether you'd be willing to help out on the item.
  • Share the ETA if you're driving the item and have an guesstimate on when it will be done.

cc @apache/mxnet-committers

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature

@szha
Copy link
Member Author

szha commented Apr 4, 2019

The changes since 1.4.0 release that are already merged in the master branch will be included in the 1.5.0 release. The list can be found at: https://github.com/apache/incubator-mxnet/compare/v1.4.x...master?expand=1

@eric-haibin-lin
Copy link
Member

Hi everyone, I've created v1.5.x branch here: https://github.com/apache/incubator-mxnet/tree/v1.5.x
Before we have an agreement on the timeline and features, I will synchronize this branch with the master branch periodically. Once we have decided the code freeze day, we will only cherry-pick required changes/features to the branch then.

@anirudh2290
Copy link
Member

Thanks for starting this! I would like to include exception handling fixes: #14397 (@anirudh2290), #14433(@anirudh2290) , #14575 (@arcadiaphy). These three should be merged by end of next week hopefully. Conversion of FP32 models to mixed precision models (#14584) (Should be in by May first week tentatively). In addition, I have some changes to profiler to visualize gpu memory pooling and help make better decisions on the env variable choice. It is currently in a branch (https://github.com/anirudh2290/mxnet/tree/memory_profiler_poc2) and intend to open a PR soon (next week).

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented Apr 5, 2019

MKLDNN Quantization PR

Name PR# status
sum #14614 DONE
relu #14604 DONE
refactor requantize #14608 DONE
improve quantize #14641 DONE
conv + activation #14819 DONE
cache op #14785, #14931 DONE
quantization flow to support 0 shape (RNN, concat) #15031 DONE
New models (SSD COCO/RN18/MobileNet v2) #14646, #14823 DONE

FP32 optimization

Name PR# status
data loader for CPU #14824 DONE
transpose #14545 DONE
RNN refactor with NNVM #14476 DONE
reshape enhance #14903 DONE
sum1d #14914 DONE
softmax 1d #14818 DONE
MKL Math (ERF, mean, etc) #14893 DONE
MKLDNN RNN (vRNN,LSTM) #14713 DONE
Build (Window/Linux) #14740, #14743, dmlc/mshadow#374, #14829 #14877 DONE
Update MKLDNN to 0.19 #14783 DONE

Documentations

Name PR# status
Windows Build Instruction #14952 DONE
MKLDNN OP #14891 DONE

@zboldyga
Copy link
Contributor

zboldyga commented Apr 5, 2019

Some users pointed out useful features around matrix inversions, determinants, log determinants. I propose to add some small features to make these calculations easier: https://issues.apache.org/jira/projects/MXNET/issues/MXNET-1350?filter=allissues .

#14360

Comment in this issue: These are relevant calculations and some adjustments to the existing tools would help newcomers more easily leverage the existing work.

I'm interested and willing to implement this feature.

I'm quite busy at the moment but can likely finish this over a few days before mid May.

Thoughts?

@jmacglashan
Copy link

jmacglashan commented Apr 20, 2019

Easily the biggest feature Mxnet is lacking is the higher order gradient support. There appears to be some work to get this going, but it's been a bit stagnant. The lack of strong support for this feature prohibits the ability to implement a number of DL algorithms. Everything beyond this seems like quality of life features. I would offer to help on this front, but I won't have the time necessary to work it out. I list it here in hopes that others will answer the call.

Beyond that, I think having dynamic shape in symbols would be a nice feature.

On a smaller scale, I think it would be nice if Gluon had support for blocks that operate on keyword arguments. It's pretty easy to add support for that in a non-breaking way (and I've done it in my own projects), but ideally this feature would be supported in other code like the data loader, which currently is fairly structure around assuming tuples rather than dicts (which would pair with keyword args).

A nitpick that I have is that when it comes to serialization, Mxnet (python) seems to assume you always want to write to a file in that it requests a path to a file to serialize the data. This often isn't appropriate in production systems. It would be much nicer if Mxnet simply took a file-like object or just returned bytes so you can do what you want with it.

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Apr 21, 2019

Features I'd like to see for 1.5 include:

Amp if ready

New TensorRT integration with subgraph API support and FP16

NVTX ranges for easier GPU profiling.

+1 @pengzhao-intel on MKLDNN work. I'd love to make use of these optimizations. +1 to @anirudh2290's 3 very useful improvements.

@stereomatchingkiss
Copy link

Any plan to simplify compilation process on windows?
Any document to show us how to compile mxnet with the support of mkldnn on windows?

@pengzhao-intel
Copy link
Contributor

Any plan to simplify compilation process on windows?
Any document to show us how to compile mxnet with the support of mkldnn on windows?

Yes, we have the plan for MKLDNN on windows and will fix it in 1.5. I will add into my table.
@yinghu5 @NeoZhangJianyu

@KellenSunderland KellenSunderland unpinned this issue Apr 28, 2019
@szha szha pinned this issue Apr 29, 2019
@shuokay
Copy link
Contributor

shuokay commented May 6, 2019

Update parameters manually in training loop.
#14735

@roywei
Copy link
Member

roywei commented May 8, 2019

I'd like #14869 to go in, estimated time to complete 05/10

@stu1130
Copy link
Contributor

stu1130 commented May 15, 2019

Dependency Updates PR
#14950 Update CI to use latest cuDNN & fix the ARCH_MISMATCH error on m60 gpu
#14887 CUDA 10.1 PyPi script
#14588 Update the numpy version

@mouryarishik
Copy link

mouryarishik commented May 15, 2019

I desperately need higher order differentiation. Plz make it possible. Thanks to everyone for all your contributions so far.

@roywei
Copy link
Member

roywei commented May 16, 2019

@mouryarishik @jmacglashan Hi, about higher order gradients, @apeforest and @larroy are actively working on this and will be first available in the master branch and nightly pip install packages. Unfortunately, it won't make it to 1.5.0 as we plan to release soon. Stay tuned, thanks!

@aaronmarkham
Copy link
Contributor

Should we formally deprecate amalgamation as all it does is lead people down a dead end?

@szha
Copy link
Member Author

szha commented May 17, 2019

@aaronmarkham is it broken?

@kohillyang
Copy link

kohillyang commented May 19, 2019

@aaronmarkham so does there exist a tutorial to illustrate how to get libmxnet.so for mobile devices?

@aaronmarkham
Copy link
Contributor

@szha My understanding is that it doesn't work. There are several open issues about it, but I haven't tried it out yet myself.
@kohillyang I'd love to see a guide for this using a recent build of MXNet. The closest we have is the amalgamation guide:
https://mxnet.incubator.apache.org/versions/master/faq/smart_device.html
If you try it out, please keep me posted - I'd be happy to get the guide updated with tips on getting it to work.

@szha
Copy link
Member Author

szha commented May 20, 2019

@aaronmarkham that sounds like something that needs fixing. not sure if it's enough reason to kill it though

@larroy
Copy link
Contributor

larroy commented Jun 19, 2019

Wouldn't it be better to have a preprocessor flag to achieve the same result? Cross compilation is solved.

@larroy
Copy link
Contributor

larroy commented Jun 19, 2019

@mouryarishik could you give details about your usecase? Thanks.

@mouryarishik
Copy link

@larroy A lot of GAN models require 2nd order gradients for stabilised training.

@vafl
Copy link
Contributor

vafl commented Jun 19, 2019

Would it be possible to fix this gluon serialization/deserialization bug #12795 in the 1.5 release?

It has been opened for a long time (still not working in 1.4.1) and makes it hard to serialize gluon graphs for some applications e.g. in gluon-ts.

@apeforest
Copy link
Contributor

@mouryarishik We've already have a few operators to support higher order gradient:

elemwise_mul, log, log10, relu, FullyConnected, sin, cos, exp

However, due to the current design of NNVM, the graph data structure to model the computation graph, the support of higher order gradient in operators has to be implemented one by one (good news is that moving to NNVM 2.0 in the near future higher order gradient in operators will be supported automatically by NNVM).

In the meantime before NNVM is upgraded to 2.0, we plan to support higher order gradient in a limited number of operators. It would be great if you could identify a set of operators that are used in your model and require higher order gradient support. We will prioritize implementation for those operators.

Thanks for your continuous support and passion for MXNet.

@larroy
Copy link
Contributor

larroy commented Jun 20, 2019

I guess depends on the GAN, as you could have any layer, so if you want to use GAN with convs you need higher order for conv...

@szha
Copy link
Member Author

szha commented Jun 20, 2019

@vafl duplicate name issue should have been fixed already.

@vafl
Copy link
Contributor

vafl commented Jun 20, 2019

@szha In 1.4.1 the issue is still there (see the reproducible example in #12795 ). When you reuse any layer in a gluon graph, the graph cannot be serialized and loaded anymore. You have to explicitly create a new layer and share the parameters.

I think what was pushed is a workaround of this issue for RNNs.

@szha
Copy link
Member Author

szha commented Jun 21, 2019

@vafl yes what I meant is that 1.5.0 will include the fix. If you use the nightly package of mxnet you will see that the included code example is passing correctly.

@vanewu
Copy link

vanewu commented Jul 16, 2019

Adding a shape property to mxnet symbol would be great.

@pengzhao-intel
Copy link
Contributor

@szha there're already lots of great proposals from the community.
I think we need to create a new topic for 1.6 roadmap :)

@szha
Copy link
Member Author

szha commented Jul 18, 2019

See #15589 for the new roadmap discussion.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests