-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unrolled recurrent layers (RNN, LSTM) #1873
Conversation
* A sample code was added. * `slice_dim` and `slice_point` attributes were explained.
[docs] brief explanation of SLICE layer's attributes
19501cc
to
c38f9ac
Compare
0f110c1
to
668ab41
Compare
I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:
Then, you can train a language model using Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores. |
872e47c
to
716262a
Compare
Next: release candidater
fix Imagenet example path
set the right rpath for tools and examples respectively thanks for the report @mees!
[build] fix dynamic linking of tools
… was overwritten with symlink created at build time and installed with install(DIRECTORY ...)
everything in Reshape)
Could someone give me some guidance about how to construct a RNN with jeffdonahue's PR? I have downloaded the lrcn.prototxt , unfortunately I cannot understand most of its contents , such as include { stage: "freeze-convnet" }, include { stage: "unfactored" } and so on. In fact,I have some time sequence image data , each of which has a label. I have trained reference model in caffe with these data, and now I try to use RNN to classify them. What document I should read so that I can understand lrcn.prototxt and something like this,and then train a RNN model with my data. Much thanks ! |
I have been able to train the LRCN model successfully. |
Is it possible to get prototxt network example for the activity recognition case? |
Same question as @Kumaresh-Krishnan, would appreciate any replies about "how to test", thanks. |
check this |
Is this still the most updated LSTM implementation on Caffe? Just wondering if there are any major updates not in this branch. Anyway, has anybody tried bidirectional LSTM using this implementation? Some pointers on this one, please. Thanks! |
Based on #1872 (adds EmbedLayer -- not technically used here but often used with RNNs in practice, and will be needed for my examples), which in turn is based on #1486 and #1663.
This adds an abstract class
RecurrentLayer
intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time.RecurrentLayer
implementations (here, justRNNLayer
andLSTMLayer
) specify the recurrent architecture by filling in a NetParameter with appropriate layers.RecurrentLayer
requires 2 input (bottom) Blobs. The first -- the input data itself -- has shapeT x N x ...
and the second -- the "sequence continuation indicators"delta
-- has shapeT x N
, each holdingT
timesteps ofN
independent "streams".delta_{t,n}
should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T
) is copied into the initial hidden state for the new forward pass (h_0
), allowing for exact inference across arbitrarily long sequences, even ifT == 1
. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.Note that the
T x N
arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.Examples of using these layers to train a language model and image captioning model will follow soon.