Roadmap #22

pluskid · 2014-12-19T03:51:58Z

Discussions and/or suggestions are welcome!

Interface
- Network architecture visualization
- Recurrent Neural Networks
Infrastructure
- CUDA Stream
- Multi-GPU support
- 4D tensor -> ND tensor (4D tensor -> ND tensor (for layers) #21, Cutsomizable "channel" dimension for ND-tensor #25)
- Unsupervised Learning ((deep) autoencoders and variants) (Stacked Denoising Auto-encoders #29)
Document
- Developer's Guide

jfsantos · 2014-12-20T15:47:28Z

Even though restricted Boltzmann machines (and DBMs/DBNs) and autoencoders (DAE, CAE, stacked autoencoders) have a different principle as they are unsupervised, having an implementation that follows the Mocha architecture could be useful. We started discussing this for DBNs here, as we have a simple implementation for RBMs and DBNs and would like to make it compatible with Mocha.

pluskid · 2014-12-21T01:48:29Z

@jfsantos Thanks! I think autoencoders, although unsupervised, are still trained with SGD, we just specify the label to be the same as the input data, and then in principle we could already do this in Mocha. And we might need to add some special layers to support variants of autoencoders. But I might be wrong, as I haven't worked on autoencoders at all. Do you know the details?

As for Bayesian networks, yes, I agree they are very different paradigms. And especially we already have a package (dfdx/Boltzmann.jl#3) on that, I think it is better to keep them in two different packages. But definitely making them compatible should be a goal, and maybe some collaboration.

For using DBNs/DBMs to initialize the weights of DNNs, I think this might already be quite easy. If you could export the weights to HDF5 file with compatible naming, then Mocha should be able to load them, just like loading Caffe's exported models, and then start supervised training on that. We could make Mocha's loading interface richer by for example, allow the user to control in fine details which layer should load from which file a dataset with which name, etc. Also we could probably discuss about a common data format that suits both needs.

jfsantos · 2014-12-21T05:04:34Z

You are right about autoencoders being trained with SGD as MLPs. There are some "special" things, though:

specific regularizers/cost functions (e.g., for contractive and sparse autoencoders)
Tied weights: the case where the decoder's weight matrix is simply the transposed weight matrix of the encoder, so you only update the encoder weight matrix (however, each layer has its own biases).
A "corruption layer" is needed for adding noise/zeroing elements of the input data in the case of denoising autoencoders.
In case we want to support stacked autoencoders, they're a bit of a different animal (more like DBNs, in the sense you have to iteratively train them layer by layer).

I'll work on a draft implementation for initializing a DNN with a DBN from Boltzmann.jl and let you know as soon as I have something (hopefully, by submitting a pull request!).

pluskid · 2014-12-22T00:59:10Z

@jfsantos Thanks for the details! I see, it is kind of do-able but not trivial. I need to think about this further.

philtomson · 2014-12-23T01:37:39Z

Just wondering what the ETA for recurrence support might be?

pluskid · 2014-12-23T01:57:04Z

@philtomson That is definitely a plan/goal, but maybe after the auto-encoders. The reason is that I do not know RNN enough to start implement them right away. But I think many of the building-blocks are already there. Especially if you want to do a simple explicit unfolding of fixed-length history, I think one could already have a model like that by making use of the shared-parameter mechanism in Mocha. For variable-length RNN support, I need to think more, especially about how the interface should be organized.

That being said, suggestions are very welcome from people who already know RNN. For example, what is the simplest, representative and reproducible example for RNN (like MNIST for CNN)? Are there any nice existing library for RNN (whose way of organizing the user interface we should possible learn from)? etc.

zhongwen · 2014-12-23T04:04:51Z

@pluskid Maybe the followings are helpful:
Andrej Karpathy's Neuraltalk: https://github.com/karpathy/neuraltalk
Alex Graves's RNNLIB http://sourceforge.net/projects/rnnl/

pluskid · 2014-12-23T06:00:12Z

@zhongwen Thanks for the links!

the-moliver · 2015-01-22T22:34:48Z

I'm planning to add time-delay neural networks. I have a working implementation ( https://github.com/the-moliver/NeuralNets.jl ) that I want to port to Mocha.

philtomson · 2015-07-06T19:38:49Z

It would be nice to have a Caffe file -> Mocha converter. Maybe I'll work on something like that. Should be doable, right? Or are there Caffe features that are not yet in Mocha?

pluskid · 2015-07-06T19:58:36Z

We already have the ability to load caffe models, but you still need to manually translate the model definition. Automatic translation of architecture is theoretically possible but I guess might by quite tedious to implement. (I'm thinking maybe there should be some universal Dnn architecture specification language coming out recently). Most of the core functionality in caffe has correspondence in mocha. But caffe also have many unofficial forks, which implemented some specific layers, for those, it is more difficult to convert.

nikolaypavlov · 2015-08-17T20:21:01Z

It would be nice to have maxout layer in addition to dropout..
Max-norm Regularization can help for optimal dropout nets tuning.

pluskid · 2015-08-18T17:19:32Z

@nikolaypavlov Thanks for the suggestions

Based on my understanding, maxout is simply a max pooling over some units. We can achieve this by using the existing PoolingLayer or ChannelPoolingLayer. Let me know if you are talking about something else.
Max-norm regularization is actually implemented, see for example filter_cons for ConvolutionLayer.

nikolaypavlov · 2015-08-19T20:18:10Z

@pluskid Great, I'll try to play with PoolingLayer.

outlace · 2015-08-31T18:16:37Z

Is this project meant to be the Theano/Torch of Julia?

Is there ever going to be OpenCL support?

pluskid · 2015-09-06T07:34:25Z

@outlace, this is more like torch than theano in that sense. There is no planned Opencl support unless Julia gets better native support for gpu targets.

nstiurca · 2015-10-09T15:22:20Z

I would be very interested in OpenCL support as well. In fact, I have half a mind to take a stab at it myself. If I can leverage an OpenCL BLAS library (say, CLBLAS.jl), then I basically just have to write im2col.cl and a couple of pooling and neuron kernels, and structure everything else similarly to the CUDA backend.

If I did this, in the interest of clarity would you be OK with renaming GPUBackend -> CUDABackend (adding @deprecated typealias GPUBackend CUDABackend or similar for compatibility), and naming the new backend OCLBackend?

pluskid · 2015-10-09T15:29:50Z

@nstiurca Thanks! This could be cool! Yes, I'm OK with the renaming if we have a working OpenCL backend!

nstiurca · 2015-10-09T15:43:48Z

OK, I will get started this weekend. Should we open an issue for the sake of tracking? Development-wise, it will be simplest for me to create an opencl branch on the fork of your project that I already have. Do you prefer to have such a branch in your repo as well until OpenCL support is stable (assuming we get there...)? It might be good to do that for the sake of anyone else that wants to help develop OpenCL support.

pluskid · 2015-10-09T15:47:13Z

I would suggest do it in your branch, but open a pull request to here, with "[WIP]" in the title and description of the goal and current progress in the text (that you could updates periodically). I will not merge the pr until you have something reasonablely stable, but people will see the pr and could probably jump in to help.

nstiurca · 2015-10-09T15:53:26Z

That works for me. Look for it later today.

outlace · 2015-10-09T16:08:28Z

I think this is great. I currently have to use Torch because it's the only mature package that has an OpenCL backend. Being able to run models on my Macbook is fantastic. Really looking forward to this getting OpenCL support.

nstiurca · 2015-10-10T02:41:33Z

@outlace Caffe also has a fork with OpenCL support, but unfortunately for me I haven't been able to get either Torch nor Caffe to work on 32-bit ARM processor even though it has a fully compliant OpenCL 1.1.

Thus, I am going to start on rolling my own. See PR #155.

lqh20 · 2015-10-24T17:04:31Z

Any plans to implement batch normalization (http://jmlr.org/proceedings/papers/v37/ioffe15.pdf )? Looks like it's a great step forward in terms of trainging time!

pluskid · 2015-10-25T00:32:41Z

@lqh20 I'm recently joining a new project MXNet. We are building a julia interface called MXNet.jl. It is still at relatively early stage, but some features are already working. For example, batch normalization and multi-GPU training in the cifar-10 example is already working quite nicely.

philtomson · 2015-10-25T00:43:27Z

Is MXNet.jl complementary to Mocha.jl or meant to replace it?
On Oct 24, 2015 5:32 PM, "Chiyuan Zhang" notifications@github.com wrote:

@lqh20 https://github.com/lqh20 I'm recently joining a new project
MXNet. We are building a julia interface called MXNet.jl
https://github.com/dmlc/MXNet.jl. It is still at relatively early
stage, but some features are already working. For example, batch
normalization and multi-GPU training in the cifar-10 example
https://github.com/dmlc/MXNet.jl/blob/master/examples/cifar10/cifar10.jl#L9
is already working quite nicely.

—
Reply to this email directly or view it on GitHub
#22 (comment).

pluskid · 2015-10-25T00:50:17Z

@philtomson It depends. Mocha.jl still has its advantage of simplicity and portability. But in terms of computational efficiency or feature richness, I think MXNet.jl should be replacing Mocha.jl. Because it is built on top of libmxnet which is a language agnostic general deep learning library that is designed to have, for example, multi-GPU support. Moreover, the core component of libmxnet is being actively developed by a team, so in terms of features it is much better than Mocha.jl which is currently primarily developed by me in my very little free time. libmxnet itself is actually a joint efforts of authors from several different deep learning libraries.

philtomson · 2015-10-25T17:17:10Z

I wonder if mxnet could be an alternate backend for Mocha.jl? It seems like
that would preserve the advantages of Mocha.jl - simplicity, portability,
good documentation - while also allowing users to drop directly to your
MXNet.jl bindings if needed.

On Sat, Oct 24, 2015 at 5:50 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson It depends. Mocha.jl still
has its advantage of simplicity and portability. But in terms of
computational efficiency or feature richness, I think MXNet.jl should be
replacing Mocha.jl. Because it is built on top of libmxnet which is a
language agnostic general deep learning library that is designed to have,
for example, multi-GPU support. Moreover, the core component of libmxnet is
being actively developed by a team, so in terms of features it is much
better than Mocha.jl which is currently primarily developed by me in my
very little free time. libmxnet itself is actually a joint efforts of
authors from several different deep learning libraries.

—
Reply to this email directly or view it on GitHub
#22 (comment).

pluskid · 2015-10-26T03:31:57Z

@philtomson That could be one possible option. I will wait and see if that is feasible. As using MXNet.jl introduce an external dependency on libmxnet. If that dependency itself is not a problem, then using MXNet.jl directly might be a more viable option. Though a something still needs to be improved, esp. documents.

philtomson · 2015-10-26T23:01:16Z

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson That could be one possible
option. I will wait and see if that is feasible. As using MXNet.jl
introduce an external dependency on libmxnet. If that dependency itself is
not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of
simple, declarative style of Mocha.jl while also being able to take
advantage of the performance of MXNet.jl. Sure libmxnet has an advantage
of being "language independent", however, that can also be a weakness. It
could mean that you can't readily take advantage of powerful language
features specific to Julia, like macros, for example (or at least it might
be more difficult to do so). I suspect there's a lot of boilerplate code
required when you use libmxnet that could be eliminated at a higher level
of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess
from what I understand of the docs mxnet allows for multiple GPUs whereas
Mocha.jl only allows for using one? If you compare performance between
Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty
close?

I can see where training with multiple GPUs can be an advantage, but some
users might be running pre-trained models on a laptop with only a single
GPU (or some of us don't even have that as we only have an Intel integrated
GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually
quite sufficient for doing this (people with this kind of setup wouldn't
notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a
problem for someone who tries moving from Mocha.jl to MXNet.jl. Using
Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could
probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl
to produce c++ code that make calls directly to libmxnet (or at least
paramatize it so that you could use openMP (as now) or libmxnet in the CPP
backend.

—
Reply to this email directly or view it on GitHub
#22 (comment).

philtomson · 2015-10-27T05:44:42Z

I just got around to installing MXNet.jl and playing with it some. So far
it doesn't seem too much more difficult to use than Mocha,

On Mon, Oct 26, 2015 at 4:01 PM, Phil Tomson philtomson@gmail.com wrote:

On Sun, Oct 25, 2015 at 8:32 PM, Chiyuan Zhang notifications@github.com
wrote:

@philtomson https://github.com/philtomson That could be one possible
option. I will wait and see if that is feasible. As using MXNet.jl
introduce an external dependency on libmxnet. If that dependency itself is
not a problem, then using MXNet.jl directly might be a more viable option.

Right. The scenario I was thinking of was being able to keep the kind of
simple, declarative style of Mocha.jl while also being able to take
advantage of the performance of MXNet.jl. Sure libmxnet has an advantage
of being "language independent", however, that can also be a weakness. It
could mean that you can't readily take advantage of powerful language
features specific to Julia, like macros, for example (or at least it might
be more difficult to do so). I suspect there's a lot of boilerplate code
required when you use libmxnet that could be eliminated at a higher level
of abstraction.

BTW: is the GPU backend of Mocha.jl not as performant as libmxnet? I guess
from what I understand of the docs mxnet allows for multiple GPUs whereas
Mocha.jl only allows for using one? If you compare performance between
Mocha.jl with only one GPU and libmxnet with only one GPU are they pretty
close?

I can see where training with multiple GPUs can be an advantage, but some
users might be running pre-trained models on a laptop with only a single
GPU (or some of us don't even have that as we only have an Intel integrated
GPU which doesn't do CUDA) and the current setup of Mocha.jl is actually
quite sufficient for doing this (people with this kind of setup wouldn't
notice any appreciable difference from using libmxnet, perhaps)

Also: Does the mxnet project have any plans for supporting OpenCL?

Though a something still needs to be improved, esp. documents.

Mocha.jl's documents are actually pretty good at this point so this is a
problem for someone who tries moving from Mocha.jl to MXNet.jl. Using
Mocha.jl as a sort of a wrapper around MXNet.jl would mean you could
probably keep most of the documentation as is.

I suppose another idea would be to translate the CPP backend for Mocha.jl
to produce c++ code that make calls directly to libmxnet (or at least
paramatize it so that you could use openMP (as now) or libmxnet in the CPP
backend.

—
Reply to this email directly or view it on GitHub
#22 (comment).

pluskid · 2015-10-27T16:23:17Z

@philtomson Glad to hear that it works out nicely for you.

The single-GPU performance of Mocha.jl might be similar to MXNet.jl. MXNet.jl has a more flexible symbolic API to define network architectures, but internally optimizations are used to avoid unnecessary memory allocation & computation, etc. But multi-GPU is definitely a win on MXNet.jl side.

I agree that many users with small scale applications do not use GPUs. In this case, the default CPU only libmxnet.so should still be quite straightforward to compile (at least on Linux and OS X). And since libmxnet is actually relatively low level backend, many of the logics will still be built in Julia, and the interface is actually flexible and convenient enough to use.

One of the main goal of the joint-force under the dmlc/libmxnet is to avoid duplicated labors especially in the computational heavy backend. One layer implemented will be automatically available in Python, Julia, R frontends.

Currently I will be maintaining both Mocha.jl and MXNet.jl. In the future when MXNet.jl become more mature, I will try to advocate MXNet.jl as a successor of Mocha.jl.

pluskid · 2015-11-13T15:03:56Z

For those who is interested in RNN/LSTM in Julia. Here is an char-rnn LSTM implementation in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

pluskid mentioned this issue Dec 19, 2014

Add Roadmap #20

Closed

pluskid added the discussion label Dec 19, 2014

pluskid mentioned this issue Dec 20, 2014

Mocha.jl v0.0.5 JuliaLang/METADATA.jl#1884

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap #22

Roadmap #22

pluskid commented Dec 19, 2014

jfsantos commented Dec 20, 2014

pluskid commented Dec 21, 2014

jfsantos commented Dec 21, 2014

pluskid commented Dec 22, 2014

philtomson commented Dec 23, 2014

pluskid commented Dec 23, 2014

zhongwen commented Dec 23, 2014

pluskid commented Dec 23, 2014

the-moliver commented Jan 22, 2015

philtomson commented Jul 6, 2015

pluskid commented Jul 6, 2015

nikolaypavlov commented Aug 17, 2015

pluskid commented Aug 18, 2015

nikolaypavlov commented Aug 19, 2015

outlace commented Aug 31, 2015

pluskid commented Sep 6, 2015

nstiurca commented Oct 9, 2015

pluskid commented Oct 9, 2015

nstiurca commented Oct 9, 2015

pluskid commented Oct 9, 2015

nstiurca commented Oct 9, 2015

outlace commented Oct 9, 2015

nstiurca commented Oct 10, 2015

lqh20 commented Oct 24, 2015

pluskid commented Oct 25, 2015

philtomson commented Oct 25, 2015

pluskid commented Oct 25, 2015

philtomson commented Oct 25, 2015

pluskid commented Oct 26, 2015

philtomson commented Oct 26, 2015

philtomson commented Oct 27, 2015

pluskid commented Oct 27, 2015

pluskid commented Nov 13, 2015

Roadmap #22

Roadmap #22

Comments

pluskid commented Dec 19, 2014

jfsantos commented Dec 20, 2014

pluskid commented Dec 21, 2014

jfsantos commented Dec 21, 2014

pluskid commented Dec 22, 2014

philtomson commented Dec 23, 2014

pluskid commented Dec 23, 2014

zhongwen commented Dec 23, 2014

pluskid commented Dec 23, 2014

the-moliver commented Jan 22, 2015

philtomson commented Jul 6, 2015

pluskid commented Jul 6, 2015

nikolaypavlov commented Aug 17, 2015

pluskid commented Aug 18, 2015

nikolaypavlov commented Aug 19, 2015

outlace commented Aug 31, 2015

pluskid commented Sep 6, 2015

nstiurca commented Oct 9, 2015

pluskid commented Oct 9, 2015

nstiurca commented Oct 9, 2015

pluskid commented Oct 9, 2015

nstiurca commented Oct 9, 2015

outlace commented Oct 9, 2015

nstiurca commented Oct 10, 2015

lqh20 commented Oct 24, 2015

pluskid commented Oct 25, 2015

philtomson commented Oct 25, 2015

pluskid commented Oct 25, 2015

philtomson commented Oct 25, 2015

pluskid commented Oct 26, 2015

philtomson commented Oct 26, 2015

philtomson commented Oct 27, 2015

pluskid commented Oct 27, 2015

pluskid commented Nov 13, 2015