Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

add CapsNet example #8787

Merged
merged 5 commits into from
Dec 14, 2017
Merged

add CapsNet example #8787

merged 5 commits into from
Dec 14, 2017

Conversation

Soonhwan-Kwon
Copy link
Contributor

@Soonhwan-Kwon Soonhwan-Kwon commented Nov 23, 2017

Description

This example is MXNet implementation of CapsNet:
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules. NIPS 2017

We achieved the best test error rate=0.29% and average test error=0.303%. It is the best accuracy and fastest training time result among other implementations(Keras, Tensorflow at 2017-11-23).
The result on paper is 0.25% (average test error rate).

Implementation test err(%) ※train time/epoch GPU Used
MXNet 0.29 36 sec 2 GTX 1080
tensorflow 0.49 ※ 10 min Unknown(4GB Memory)
Keras 0.30 55 sec 2 GTX 1080 Ti

※ tensorflow implementation's batch size is 128 but MXNet and Keras implementation 's batchsize are 100.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

@Godricly
Copy link
Contributor

Do you actually update your bias in capsule layer in each training iteration?

@Soonhwan-Kwon
Copy link
Contributor Author

Soonhwan-Kwon commented Nov 27, 2017

routingalgorithm

As stated in the routing algorithm of CapsNet (line 2),
bias doesn't updated and initialized to 0 for each batch(or training iteration),
and it is implemented in that way.

* * *
## **Prerequisities**

MXNet version above (0.11.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the version here, have you tested on some more recent versions like the current master or 1.0.0rc?

Copy link
Contributor Author

@Soonhwan-Kwon Soonhwan-Kwon Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it on MXNet 0.12.1 and it works well

data_flatten = mx.sym.flatten(data=data)
squared_error = mx.sym.square(x_recon-data_flatten)
recon_error = mx.sym.mean(squared_error)
loss = mx.symbol.MakeLoss((1-0.392)*margin_loss(y_onehot, out_caps)+0.392*recon_error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to make 0.392 as part of the function arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review and I added 0.392 as option named recon_loss_weight

@sxjscience
Copy link
Member

@piiswrong Do you think this example can be merged in? It looks good.

@Soonhwan-Kwon
Copy link
Contributor Author

@sxjscience Thank you for review,
First we tested on MXNet 0.12.1 and it works well,
and we add 0.392 as option named recon_loss_weight.
In addition we attached MXNet's tensorboard for plotting.

@Soonhwan-Kwon
Copy link
Contributor Author

@piiswrong Can this example be merged?

@piiswrong
Copy link
Contributor

@Soonhwan-Kwon Yes. Could you rebase to master? needs CI to pass

@Soonhwan-Kwon
Copy link
Contributor Author

@piiswrong Thank you for review. I rebased to the master.

@piiswrong piiswrong merged commit 8623bab into apache:master Dec 14, 2017
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* add capsnet example's layer

* add capsnet example

* add recon_loss_weight option and tensorboard for plot

* update readme to install tensorboard

* fix print of loss scaled to 1/batchsize
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* add capsnet example's layer

* add capsnet example

* add recon_loss_weight option and tensorboard for plot

* update readme to install tensorboard

* fix print of loss scaled to 1/batchsize
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants