Unrolled recurrent layers (RNN, LSTM) #1873

jeffdonahue · 2015-02-16T00:25:32Z

Based on #1872 (adds EmbedLayer -- not technically used here but often used with RNNs in practice, and will be needed for my examples), which in turn is based on #1486 and #1663.

This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

Examples of using these layers to train a language model and image captioning model will follow soon.

* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.

[docs] brief explanation of SLICE layer's attributes

See https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt

Correct 'epochs' to 'iterations'

jeffdonahue · 2015-02-17T00:43:09Z

I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

cd data/coco
./get_coco_aux.sh # download train/val/test splits
./download_tools.sh # download official COCO tool
cd tools
python setup.py install # follow instructions to install tools and download COCO data if needed
cd ../../.. # back to caffe root
./examples/coco_caption/coco_to_hdf5_data.py

Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.sh).

Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

Next: release candidater

…current project

fix Imagenet example path

@mees

set the right rpath for tools and examples respectively thanks for the report @mees!

[build] fix dynamic linking of tools

… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)

everything in Reshape)

…types

niuchuang · 2015-05-01T10:36:18Z

Could someone give me some guidance about how to construct a RNN with jeffdonahue's PR? I have downloaded the lrcn.prototxt , unfortunately I cannot understand most of its contents , such as include { stage: "freeze-convnet" }, include { stage: "unfactored" } and so on. In fact,I have some time sequence image data , each of which has a label. I have trained reference model in caffe with these data, and now I try to use RNN to classify them. What document I should read so that I can understand lrcn.prototxt and something like this,and then train a RNN model with my data. Much thanks !

Kumaresh-Krishnan · 2015-06-10T07:09:47Z

I have been able to train the LRCN model successfully.
Could someone guide me on how to test this model on a small set of images and also view the generated captions?
Thanks

mostafa-saad · 2015-06-25T23:44:40Z

Is it possible to get prototxt network example for the activity recognition case?
Is it possible to get some documentation about current fikes (e.g. lrcn.prototxt)?

liuchang8am · 2015-06-30T07:57:55Z

Same question as @Kumaresh-Krishnan, would appreciate any replies about "how to test", thanks.

sxjzwq · 2015-07-08T02:48:11Z

check this
#2033

twinanda · 2016-08-18T15:45:07Z

Is this still the most updated LSTM implementation on Caffe? Just wondering if there are any major updates not in this branch.

Anyway, has anybody tried bidirectional LSTM using this implementation? Some pointers on this one, please. Thanks!

shelhamer and others added 7 commits January 24, 2015 18:27

clarify draw_net.py usage: net prototxt, not caffemodel

2f869e7

[docs] ask install + hardware questions on caffe-users

61c63f6

[docs] send API link to class list

4cc8195

[docs] add check mode hint to CPU-only mode error

1f7c3de

Brief explanation of SLICE layer's attributes

8b96472

* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.

lint 1f7c3de

75d0e16

Merge pull request BVLC#1817 from boechat107/patch-1

e3c895b

[docs] brief explanation of SLICE layer's attributes

jeffdonahue force-pushed the recurrent branch 4 times, most recently from 19501cc to c38f9ac Compare February 16, 2015 08:15

Brandon Amos and others added 2 commits February 16, 2015 15:09

Correct 'epochs' to 'iterations'

1e0d49a

See https://github.com/BVLC/caffe/blob/master/models/bvlc_reference_caffenet/solver.prototxt

Merge pull request BVLC#1879 from bamos/patch-1

3e9b050

Correct 'epochs' to 'iterations'

jeffdonahue force-pushed the recurrent branch 3 times, most recently from 0f110c1 to 668ab41 Compare February 17, 2015 00:40

jeffdonahue force-pushed the recurrent branch 3 times, most recently from 872e47c to 716262a Compare February 17, 2015 00:57

jeffdonahue mentioned this pull request Feb 17, 2015

RNN LSTM layer #1653

Closed

shelhamer and others added 9 commits February 19, 2015 18:35

Merge pull request BVLC#1849 from BVLC/next

f998127

Next: release candidater

Updated the path for get_ilsvrc_aux.sh to match what is found in the …

af01b9c

…current project

Merge pull request BVLC#1914 from eerwitt/master

5ee85b7

fix Imagenet example path

[build] fix dynamic linking of tools

eabbccd

set the right rpath for tools and examples respectively thanks for the report @mees!

Merge pull request BVLC#1921 from shelhamer/fix-tool-linking

682d9da

[build] fix dynamic linking of tools

check caffe tool runs in runtest

5a26333

ignore pycharm files

a1e951d

set proper CMAKE_INSTALL_RPATH for _caffe.so and tools

fca05c3

fixed bug in install-tree: _caffe.so installed by install(TARGET ...)…

645aa03

… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)

jeffdonahue added 16 commits March 4, 2015 11:31

FlattenLayer fix -- top should always Share* from bottom (and do

4a8b1ed

everything in Reshape)

AccuracyLayer: add 'denominator' param

87eb8dd

AccuracyLayer: add support for ignore_label

05009e5

EltwiseLayer can take a blob of per-num coefficients

0b29d6f

EltwiseLayer with coeff blob GPU kernel

fd299b4

Allow SliceLayer to have a single top Blob (for testing)

82120d1

Allow ConcatLayer to take a single bottom Blob (for testing)

dc113f2

Modifications to Net to facilitate unrolled recurrent networks

789b2ec

TestNet fixes for Net weight sharing modifications

50adf9b

Add RecurrentLayer: an abstract superclass for other recurrent layer …

97792f5

…types

Add RNNLayer, with tests

26a2847

Add LSTMLayer and LSTMUnitLayer, with tests

9f830fa

Add scripts for downloading COCO2014 tools & data

8e84334

Add scripts to create HDF5 datasets from COCO captions

b34164c

Prototxts + script for training COCO caption language model

6c52bd5

Prototxts + script for training LRCN COCO image captioning model

54fa90f

jeffdonahue force-pushed the recurrent branch from f4c6bf2 to 54fa90f Compare March 4, 2015 19:33

jeffdonahue closed this Mar 4, 2015

jeffdonahue mentioned this pull request Mar 4, 2015

Unrolled recurrent layers (RNN, LSTM) #2033

Closed

xiw9 mentioned this pull request May 21, 2015

unrolled LSTM layer with batch BPTT dmlc/cxxnet#159

Open

rorschachhb mentioned this pull request Jul 29, 2015

How to convert one-hot encoding into word embedding in caffe ? #2833

Closed

RatanRSur mentioned this pull request Aug 23, 2015

Implementation of RNN pluskid/Mocha.jl#89

Closed

raingo mentioned this pull request Aug 26, 2015

A MultiGPU bug with multiple input layers #2977

Closed

jfelectron mentioned this pull request Jan 18, 2016

plans to merge upstream RNN/LSTM additions? amd/OpenCL-caffe#27

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrolled recurrent layers (RNN, LSTM) #1873

Unrolled recurrent layers (RNN, LSTM) #1873

jeffdonahue commented Feb 16, 2015

jeffdonahue commented Feb 17, 2015

niuchuang commented May 1, 2015

Kumaresh-Krishnan commented Jun 10, 2015

mostafa-saad commented Jun 25, 2015

liuchang8am commented Jun 30, 2015

sxjzwq commented Jul 8, 2015

twinanda commented Aug 18, 2016

Unrolled recurrent layers (RNN, LSTM) #1873

Unrolled recurrent layers (RNN, LSTM) #1873

Conversation

jeffdonahue commented Feb 16, 2015

jeffdonahue commented Feb 17, 2015

niuchuang commented May 1, 2015

Kumaresh-Krishnan commented Jun 10, 2015

mostafa-saad commented Jun 25, 2015

liuchang8am commented Jun 30, 2015

sxjzwq commented Jul 8, 2015

twinanda commented Aug 18, 2016