RNN + LSTM Layers #3948

jeffdonahue · 2016-04-05T23:23:48Z

This PR includes the core functionality (with minor changes) of #2033 -- the RNNLayer and LSTMLayer implementations (as well as the parent RecurrentLayer class) -- without the COCO data downloading/processing tools or the LRCN example.

Breaking off this chunk for merge should make users who are already using these layer types on their own happy, without adding a large review/maintenance burden for the examples (which have already broken multiple times due to changes in the COCO data distribution format...). On the other hand, without any example on how to format the input data for these layers, it will be fairly difficult to get started, so I'd still like to follow up with at least a simple sequence example for official inclusion in Caffe (maybe memorizing a random integer sequence -- I think I have some code for that somewhere) soon after the core functionality is merged.

There's still at least one documentation TODO: I added expose_hidden to allow direct access (via bottoms/tops) to the recurrent model's 0th timestep and Tth timestep hidden states, but didn't add anything to the list of bottoms/tops -- still need to do that. Otherwise, this should be ready for review.

longjon · 2016-04-08T02:29:44Z

include/caffe/layers/recurrent_layer.hpp

+
+/**
+ * @brief An abstract class for implementing recurrent behavior inside of an
+ *        unrolled network.  This Layer type cannot be instantiated -- instaed,


typo: "instaed"

RNN + LSTM Layers

weiliu89 · 2016-04-10T04:17:54Z

It doesn't work with current net_spec.py. In specific, 1) it will fail when using L.LSTM() or L.RNN() since only RecurrentParameter is defined in the caffe.proto. 2) it will fail when using L.Recurrent() since RecurrentLayer is not registered (an abstract class).

I did a simple hack by adding the following in the param_name_dict() function in net_spec.py

param_names += ['recurrent', 'recurrent']
param_type_names += ['LSTM', 'RNN']

shelhamer · 2016-04-11T20:17:13Z

@weiliu89 the recurrent parameter for these layers, like the convolution parameter for DeconvolutionLayer, is defined in net spec by naming it directly:

n = caffe.NetSpec()
...
n.lstm = L.LSTM(n.data, recurrent_param=dict(num_output=10))
...

Whether or not to map these shared parameter types as you suggest here or as suggested or DeconvolutionLayer in #3954 could be handled by a separate PR since recurrent layers are not the only instance of this.

shelhamer · 2016-04-16T06:06:35Z

include/caffe/layers/recurrent_layer.hpp

+      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
+
+  /// @brief A helper function, useful for stringifying timestep indices.
+  virtual string int_to_str(const int t) const;


It's a little surprising to see a helper like this show up in the recurrent layer, but if there weren't any use for it elsewhere then I suppose it could live here. That said, there is already format.hpp and its format_int() function that was added for cross-platform compatibility in b72b031. How about making use of that instead?

shelhamer · 2016-04-16T07:43:50Z

LGTM overall—my only comments were about comments and naming (and that one int -> string function). @longjon are you done with your review?

ajtulloch · 2016-04-17T09:31:49Z

Looks great. Thanks for this @jeffdonahue. We've been using a variant of this for a while and it has performed great.

One thing we can additionally PR/gist (if it's useful) is a wrapper around the LSTM layer that allows for arbitrary length (batched) forward propagation - which came in handy when doing inference on arbitrary length sequences (relaxing the constraint around T_ while preserving memory efficiency for the forward pass by reusing activations across timesteps).

jeffdonahue · 2016-05-03T22:04:41Z

@shelhamer @longjon thanks for the review! Fixed as suggested.

@ajtulloch glad to hear it's been working for you guys, thanks for looking it over! I'm not sure I understand the idea of the wrapper though. I think this implementation should be able to do what you're saying -- memory efficient forward propagation over arbitrarily long sequences -- by feeding in T_=1 (1xNx...) data to the RecurrentLayer and setting cont=0 at the first timestep of the sequence, then cont=1 through the end (then starting over with cont=0 at the start of the next sequence). This should reuse the activation memory as you mentioned (using just O(N) memory rather than O(NT)). (In fact, this capability is the point of having the cont input in the first place.) Maybe your wrapper is a friendly interface that handles all the bookkeeping for this? In that case it definitely sounds like it would be helpful. Or maybe I'm totally misunderstanding?

ajtulloch · 2016-05-03T22:33:20Z

@jeffdonahue yeah, the only contribution was around allowing variable-T_ inputs but still batching the i2h transform - this was substantially faster than the approach you describe (T_ = 1 and looping which I initially did), IIRC ~3x for some of our models. It costs a bit more memory (NxT_xD of the i2h, but only NxD for the h/c states for arbitrary T_), but saves NxT_xD for the h/c states. https://gist.github.com/ajtulloch/2b7a98de642df934456001de238ed5c7 is the CPU impl - it's a bit niche so I wouldn't advocate pulling it at all, but might be handy for someone who hits this issue in the future.

jeffdonahue · 2016-05-03T22:43:27Z

Ah -- batching the input transformation regardless of sequence length indeed makes sense. Thanks in advance for posting the code!

Used the proto file from BVLC/caffe#3948

MinaRe · 2016-05-13T07:29:20Z

Dear all

I have very big matrix(rows are ID and columns are label ) and I was wondering to know How can i do the training on caffe with just fully connected layers?

Thanks a lot.

Used the proto file from BVLC/caffe#3948

dangweili · 2016-05-20T08:25:44Z

When will this issue be merged ?

yshean · 2016-05-21T03:32:20Z

Anyone successfully merged @jeffdonahue caffe:recurrent-layer and BVLC's caffe:master? Why does the assertion of CHECK_EQ(2 + num_recur_blobs + static_input_, unrolled_net_->num_inputs()); fail during make runtest?

[----------] 9 tests from LSTMLayerTest/0, where TypeParam = caffe::CPUDevice<float>
[ RUN      ] LSTMLayerTest/0.TestForward
F0521 03:29:55.683001  5650 recurrent_layer.cpp:142] Check failed: 2 + num_recur_blobs + static_input_ == unrolled_net_->num_inputs() (4 vs. 0)

RNN + LSTM Layers * jeffdonahue/recurrent-layer: Add LSTMLayer and LSTMUnitLayer, with tests Add RNNLayer, with tests Add RecurrentLayer: an abstract superclass for other recurrent layer types

…LSTM Layers BVLC#3948' by jeffdonahue for BVLC/caffe master.

…types

jeffdonahue · 2016-06-02T00:28:23Z

Thanks again for the reviews everyone. Sorry for the delays -- wanted to do some additional testing, but I'm now comfortable enough with this to merge.

ajtulloch · 2016-06-02T01:35:46Z

Very nice work @jeffdonahue.

naibaf7 · 2016-06-02T04:36:09Z

@jeffdonahue
Now also available on the OpenCL branch.

jakirkham · 2016-06-03T16:13:03Z

Any plans for a release?

antran89 · 2016-06-07T06:23:18Z

Can you have a link of a working tutorial/example on using these layers? It would be easier for new learners. I know you have it somewhere.

RNN + LSTM Layers

wenwei202 · 2016-06-30T22:53:49Z

Great work!!! @jeffdonahue I used https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/ as the example to do ./examples/coco_caption/train_language_model.sh. The code I used is BVLC master. It converges well at the beginning but diverges after Iteration 2399 as following:

I0630 15:15:16.417166 23801 solver.cpp:228] Iteration 2397, loss = 61.5563
I0630 15:15:16.417196 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 3.13294 (* 20 = 62.6589 loss)
I0630 15:15:16.417207 23801 sgd_solver.cpp:106] Iteration 2397, lr = 0.1
I0630 15:15:16.533344 23801 solver.cpp:228] Iteration 2398, loss = 61.561
I0630 15:15:16.533375 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 3.13485 (* 20 = 62.6971 loss)
I0630 15:15:16.533386 23801 sgd_solver.cpp:106] Iteration 2398, lr = 0.1
I0630 15:15:16.655758 23801 solver.cpp:228] Iteration 2399, loss = 61.5369
I0630 15:15:16.655824 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 2.98118 (* 20 = 59.6236 loss)
I0630 15:15:16.655838 23801 sgd_solver.cpp:106] Iteration 2399, lr = 0.1
I0630 15:15:16.776641 23801 solver.cpp:228] Iteration 2400, loss = 78.3731
I0630 15:15:16.776676 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 87.3366 (* 20 = 1746.73 loss)
I0630 15:15:16.776690 23801 sgd_solver.cpp:106] Iteration 2400, lr = 0.1
I0630 15:15:16.892026 23801 solver.cpp:228] Iteration 2401, loss = 95.2123
I0630 15:15:16.892060 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 87.3365 (* 20 = 1746.73 loss)
I0630 15:15:16.892071 23801 sgd_solver.cpp:106] Iteration 2401, lr = 0.1
I0630 15:15:17.007628 23801 solver.cpp:228] Iteration 2402, loss = 112.041
I0630 15:15:17.007663 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 87.3365 (* 20 = 1746.73 loss)
I0630 15:15:17.007675 23801 sgd_solver.cpp:106] Iteration 2402, lr = 0.1
I0630 15:15:17.123337 23801 solver.cpp:228] Iteration 2403, loss = 128.873
I0630 15:15:17.123373 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 87.3365 (* 20 = 1746.73 loss)
I0630 15:15:17.123384 23801 sgd_solver.cpp:106] Iteration 2403, lr = 0.1
I0630 15:15:17.239030 23801 solver.cpp:228] Iteration 2404, loss = 145.734
I0630 15:15:17.239061 23801 solver.cpp:244] Train net output #0: cross_entropy_loss = 87.3365 (* 20 = 1746.73 loss)
I0630 15:15:17.239074 23801 sgd_solver.cpp:106] Iteration 2404, lr = 0.1

Any suggestion?

UsamaShafiq91 · 2016-07-01T05:10:05Z

@jeffdonahue I am new to caffe. Do you have any example about RNN. How to use RNN layer.
Any help will be appreciated.

agethen · 2016-07-26T09:12:53Z

@jeffdonahue May I ask your help for a clarification?
Consider we have an Encoder-Decoder structure with two RNN/LSTM layers. Say we let the Encoder read features X, and the Decoder output its state H, and say that the state of the encoder is copied to the decoder via setting expose_hidden: true and connecting the blobs.

I can see in RecurrentLayer::Reshape that recur_input_blobs share their data with bottom blobs -- but they do not share their diff (unlike as for the top blobs)! Can the hidden state/cell state gradient then still travel backwards from decoder to encoder? Is this a misunderstanding on my side?
Thank you very much!

wenwei202 · 2016-07-29T19:57:15Z

Hello, what makes it necessary to switch the dimension order of bottom blob from N * T * ... to T * N * ...? In this way, in the batch_size in the prototxt actually is the unrolled step, right?

RNN + LSTM Layers

ayushchopra96 · 2017-08-14T04:52:19Z

Hi. @jeffdonahue @weiliu89 .
Is there support to access C(Cell State) and H(Hidden State) at each timestep of the process?
I needed to simulate an attention mechanism.

Thanks in Advance.

jeffdonahue added the ready for review label Apr 5, 2016

weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 7, 2016

fix conflict of merging BVLC#3948

b1678f3

longjon reviewed Apr 8, 2016
View reviewed changes

shelhamer added the focus label Apr 8, 2016

weiliu89 added a commit to weiliu89/caffe that referenced this pull request Apr 9, 2016

Merge pull request BVLC#3948 from jeffdonahue/recurrent-layer

8afb9c5

RNN + LSTM Layers

shelhamer reviewed Apr 16, 2016
View reviewed changes

jeffdonahue force-pushed the recurrent-layer branch 3 times, most recently from bbd33d2 to 4b6c835 Compare April 26, 2016 23:52

niketanpansare pushed a commit to niketanpansare/systemml that referenced this pull request May 10, 2016

[SYSTEMML-540] Added LSTM and RNN compatibility to Caffe.java

a416baf

Used the proto file from BVLC/caffe#3948

niketanpansare pushed a commit to niketanpansare/systemml that referenced this pull request May 13, 2016

[SYSTEMML-540] Added LSTM and RNN compatibility to Caffe.java

9579358

Used the proto file from BVLC/caffe#3948

aralph added a commit to aralph/caffe that referenced this pull request Jun 1, 2016

add LSTM, RNN and Recurrent layers. Additions according to PR 'RNN + …

d6031ee

…LSTM Layers BVLC#3948' by jeffdonahue for BVLC/caffe master.

jeffdonahue added 3 commits June 1, 2016 15:29

Add RecurrentLayer: an abstract superclass for other recurrent layer …

5f2d845

…types

Add RNNLayer, with tests

cf5f369

Add LSTMLayer and LSTMUnitLayer, with tests

51a68f0

jeffdonahue force-pushed the recurrent-layer branch from 4b6c835 to 51a68f0 Compare June 1, 2016 23:17

jeffdonahue merged commit 58b10b4 into BVLC:master Jun 2, 2016

jeffdonahue deleted the recurrent-layer branch June 2, 2016 05:14

beniz mentioned this pull request Jun 2, 2016

Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) jolibrain/deepdetect#140

Open

jeffdonahue mentioned this pull request Jun 3, 2016

Unrolled recurrent layers (RNN, LSTM) #2033

Closed

yjxiong pushed a commit to yjxiong/caffe that referenced this pull request Jun 15, 2016

Merge pull request BVLC#3948 from jeffdonahue/recurrent-layer

702db71

RNN + LSTM Layers

shelhamer mentioned this pull request Aug 25, 2016

Why LSTMParameter code isn't merged? #4629

Closed

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016

Merge pull request BVLC#3948 from jeffdonahue/recurrent-layer

fd13748

RNN + LSTM Layers

irfan798 mentioned this pull request Mar 11, 2017

Can't backpropagate to EmbedLayer input Evolving-AI-Lab/ppgn#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNN + LSTM Layers #3948

RNN + LSTM Layers #3948

jeffdonahue commented Apr 5, 2016

longjon Apr 8, 2016

weiliu89 commented Apr 10, 2016

shelhamer commented Apr 11, 2016

shelhamer Apr 16, 2016

shelhamer commented Apr 16, 2016

ajtulloch commented Apr 17, 2016

jeffdonahue commented May 3, 2016

ajtulloch commented May 3, 2016 •

edited

Loading

jeffdonahue commented May 3, 2016

MinaRe commented May 13, 2016

dangweili commented May 20, 2016

yshean commented May 21, 2016

jeffdonahue commented Jun 2, 2016

ajtulloch commented Jun 2, 2016

naibaf7 commented Jun 2, 2016

jakirkham commented Jun 3, 2016

antran89 commented Jun 7, 2016

wenwei202 commented Jun 30, 2016

UsamaShafiq91 commented Jul 1, 2016

agethen commented Jul 26, 2016

wenwei202 commented Jul 29, 2016

ayushchopra96 commented Aug 14, 2017

RNN + LSTM Layers #3948

RNN + LSTM Layers #3948

Conversation

jeffdonahue commented Apr 5, 2016

longjon Apr 8, 2016

Choose a reason for hiding this comment

weiliu89 commented Apr 10, 2016

shelhamer commented Apr 11, 2016

shelhamer Apr 16, 2016

Choose a reason for hiding this comment

shelhamer commented Apr 16, 2016

ajtulloch commented Apr 17, 2016

jeffdonahue commented May 3, 2016

ajtulloch commented May 3, 2016 • edited Loading

jeffdonahue commented May 3, 2016

MinaRe commented May 13, 2016

dangweili commented May 20, 2016

yshean commented May 21, 2016

jeffdonahue commented Jun 2, 2016

ajtulloch commented Jun 2, 2016

naibaf7 commented Jun 2, 2016

jakirkham commented Jun 3, 2016

antran89 commented Jun 7, 2016

wenwei202 commented Jun 30, 2016

UsamaShafiq91 commented Jul 1, 2016

agethen commented Jul 26, 2016

wenwei202 commented Jul 29, 2016

ayushchopra96 commented Aug 14, 2017

ajtulloch commented May 3, 2016 •

edited

Loading