Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

v1.0 Stable Release TODO List #2944

Closed
7 of 30 tasks
piiswrong opened this issue Aug 5, 2016 · 32 comments
Closed
7 of 30 tasks

v1.0 Stable Release TODO List #2944

piiswrong opened this issue Aug 5, 2016 · 32 comments
Labels
Milestone

Comments

@piiswrong
Copy link
Contributor

piiswrong commented Aug 5, 2016

It's about time for a feature complete stable release.

We are in the process of a major refactor. While most changes are in backend side and therefore should not significantly affect users, we do expect to break a few little things and maybe compatibility with other language bindings.
So authors of Julia, R, Scala, etc, package please stay tuned and adopt the new API. It should be a quick fix and we will have guide for the transition.
@thirdwing @pluskid @vchuravy @Ldpe2G

Transition Guide/List of Breaking Changes:

Developer

  1. TBlob and TShape are moved from mshadow namespace to mxnet namespace. Fix: Change mshadow::TBlob and mshadow::TShape to TBlob and TShape in your code.
  2. Please do not use cudaMalloc & cudaFree directly anywhere in MXNet. Use Storage::Get()->Alloc(size, Context::GPU()) to allocate memory to current GPU instead.

User

If you were training networks with BatchNormalization layer on CPU or on GPU with cudnn v4 or below before Jul 5th, you may find your model outputting totally wrong results after loading it back for testing. The simplest fix is to load your .param files with ndarray.load and set all arrays with key ending with '_gamma' to 1.0 and save them back.

  1. If you load model trained before Dec 2015 for prediction and the model uses BatchNorm, your model may output totally wrong results. This can be fixed by adding fix_gamma=True to all BatchNorm layers in your symbol construction script or adding 'fix_gamma': 'True' to all BatchNorm layers in your .json model file.
  2. sum_axis, max_axis, and min_axis are removed. Please use mx.nd.max(src, axis=n) to do the same thing
  3. element_mask is removed. Please use src*mask.reshape((mask.size, 1, 1, ..., 1)) directly as binary ops now support broadcasting.

TODOs

  1. Refactor Symbolic graph to use NNVM. @tqchen
    1. Finish NNVM and Passes.
    2. Refactor NDArray inferface to use nnvm::op and add cython version. @piiswrong
    3. Set ndarray function naming convention straight.
    4. Refactor Executor to use NNVM
  2. Bring in NCCL @mli
    1. Use NCCL reduce and broadcast and fix deadlock bug. NCCL is problematic with our engine
    2. Or, write our own ring based P2P reduce
  3. Better Tests @mli @piiswrong
    1. Setup EC2 test server.
    2. Setup GPU+CPU consistency and gradient check.
    3. Run performance regression test
    4. Test & debug c++ prediction interface
  4. Sparse @mli @antinucleon
  5. Better Doc @leopd
    1. Improve doc formatting and readability
    2. Fix confusing language and description.
    3. More tutorials.
    4. Reorganize docs. Put pages where they belong
    5. Improve installation guide. Consider adding a script similar to torch installation script
  6. Misc
    1. Refactor ccoptimizer interface to make writing new ccoptimizers easier. Add ccadam.
    2. Fix memory allocation policy. Explain in doc that you shouldn't use cudamalloc in operators. Use request temp space for tmp memory or pooled allocator for holding states.
    3. ...
  7. Fix known bugs
    1. Fix CustomOp bug that causes cyclic dependency do multiple "batch_dot" in loop can cause the layer unable to be linked with Convolution layer #2945
  8. IO doc and refactor
    1. Move opencv plugin into main repo and use new ndarray interface.
    2. update IIterator to support mutiple data/label.
    3. Front end based IO with more features like indexing and shuffling
@piiswrong piiswrong added this to the v1.0 milestone Aug 5, 2016
@vchuravy
Copy link
Contributor

vchuravy commented Aug 5, 2016

I would propose Float16 support as an additional target.

@antinucleon
Copy link
Contributor

antinucleon commented Aug 5, 2016

  1. High level flex RNN interface
    1. one2one, one2many seq2seq
    2. speech example
    3. lm example
    4. distributed data/model parallel benchmark
    5. attention
    6. memory/ntm
    7. better CTC support

@antinucleon
Copy link
Contributor

For optimization part, @tqchen and I are thinking about supporting throw optimizer into computation graph, so less ccxx will be needed.

@piiswrong
Copy link
Contributor Author

Until we have RTC that doesn't help much. You still need at least 2x buffer.

@antinucleon
Copy link
Contributor

We may consider to building document on EC2, then sync back to readdoc because doc build fail for time out in compile.

@piiswrong
Copy link
Contributor Author

yes. or maybe just host from ec2

@tornadomeet
Copy link
Contributor

great!!
@piiswrong what does nnvm mean?

@antinucleon
Copy link
Contributor

@vchuravy we may need to put more effort on int8 rather than fp16. From current info, int8 will be mainstream in future.

@vchuravy
Copy link
Contributor

vchuravy commented Aug 6, 2016

@antinucleon Great to hear, the work @Godricly and I have been working focused purely on making our operators support arbitrary DTypes. That should help the Int8 work as well?

(this is of topic but I would expect FixedPoint with Int8 instead of truly Int8?)

@antinucleon
Copy link
Contributor

@vchuravy It is still investigated by @winstywang If use int8 directly, there is no performance gain. But official document mentions for new TitanX, the int8 performance is 44T, almost 4 times than fp32.

@winstywang
Copy link
Contributor

@vchuravy NV should have specific instructions for int8, currently using int8 directly only brings 25% performance gain according to our test.

@Godricly
Copy link
Contributor

Godricly commented Aug 7, 2016

My suggestion as follows:

  • Documentation (most important)

  • Some kind of graph creation debugging tool

    it would be nice if we can have gui for this, its painful to debug the graph

  • dynamic execution capability for Operators (for example, stochastic depth and fractal network )

  • customOp is not Dtype compatible yet

  • A simple debugging Operator ( Just printing output and gradient, so u can insert them anywhere, can make some switch to decide what to print)

  • Check if ps-lite is compatible with DType

@piiswrong
Copy link
Contributor Author

stochastic depth can be done with bucketing.
we have monitor for debugging.

@antinucleon
Copy link
Contributor

with NNVM we may enable fully dynamic execution.

@antinucleon
Copy link
Contributor

@piiswrong @leopd We need to move doc building system to EC2. Readthedoc system is keeping failure because building out of time.

@Godricly
Copy link
Contributor

Godricly commented Aug 8, 2016

@antinucleon Is there any paper available right now for uint8 NN? And what is NNVM stands for? I'm having a hard time searching for it.

@winstywang
Copy link
Contributor

winstywang commented Aug 8, 2016

Here are some thoughts about the docs:

  • A summary page of all the examples
  • A summary page of new features recently added. Each time a new feature is added, simple explanation and sample codes must be provided.
  • WE CANNOT SAY "YOU CAN JUST USE XXX" to users, but there is no doc or simple examples for XXX. Each time we mention that, a doc or example must be provided.
  • A step by step tutorial to teach beginners how to implement some basic operations in NN, such as finetune, extract features, etc. These could cover more than 80% usage
  • Finish CS231n homework and projects with Minpy and MXNet

@piiswrong @antinucleon

@Godricly
Copy link
Contributor

Godricly commented Aug 8, 2016

Another thing I'd like to ask for is a refactor of LSTM; if it is possible.
Can we hide those provide_data and provide_label in an elegant way? I understand that currently approach works pretty well. But exposing the internal stuff may bring some troubles (like extra provided_data_type for me in fp16 lstm #2564).

@winstywang
Copy link
Contributor

I would vote for another issue which is very important for user:

  • Make sure the speed and accuracy in all test cases same or better than Caffe.
  • Currently, we have CPU slower than Caffe, small batch slower than Caffe, and Resnet on Imagenet worse than Caffe all kinds of issues related to performance.

@antinucleon
Copy link
Contributor

Resnet is caused by IO. Min has reproduced exact result by using Torch IO.
The problem is who will do that.
On Mon, Aug 8, 2016 at 02:53 Naiyan Wang notifications@github.com wrote:

I would vote for another issue which is very important for user:

Make sure the speed and accuracy in all test cases same or better than
Caffe.

Currently, we have CPU slower than Caffe, small batch slower than
Caffe, and Resnet on Imagenet worse than Caffe all kinds of issues related
to performance.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#2944 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABM13o2eTqQRmXl6jgQ4zWF3EmE8-BL9ks5qdvyjgaJpZM4JeH5k
.

Sent from mobile phone

@tqchen
Copy link
Member

tqchen commented Aug 8, 2016

I hope that for each of the issue raised, people can show up and assign, or self assign each of the issue, so we are moving forward effectively.

@mli
Copy link
Contributor

mli commented Aug 8, 2016

it's good to have a single page containing all things. but total agree that we can open issue for each point and cite the links here.

@piiswrong
Copy link
Contributor Author

@mli Yes. If someone wants to talk more about/start working on a task, feel free to open a new issue and link it here. Also assign it to milestone v1.0

@antinucleon
Copy link
Contributor

Also we may consider to treat warning as error in the future.

@yzhliu
Copy link
Member

yzhliu commented Aug 18, 2016

I'll list a roadmap for scala pkg this weekend.

@taoari
Copy link
Contributor

taoari commented Aug 19, 2016

@antinucleon Can I know what's wrong with IO that causes the performance drop?

@pluskid
Copy link
Contributor

pluskid commented Aug 19, 2016

For docs, I think the query of our github issues with keyword "how to" is a good source for getting a list of topics to potentially cover.

@windywinter
Copy link
Contributor

@piiswrong What does NNVM stands for?

@tornadomeet
Copy link
Contributor

@windywinter about NNVM: dmlc/MXNet.jl#115

@sxjscience
Copy link
Member

@antinucleon, @jennyzhang0215 and I have implemented MemN2N and NTM and replicated the results in the paper, we may release the code after AAAI or WWW. I can send you the code if you need now.

@dianyancao
Copy link

Is ok to do some code optimization in NNVM? #3105

@RogerBorras
Copy link

Thanks all DMLC for this great effort

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests