Unrolled recurrent layers (RNN, LSTM) #2033

jeffdonahue · 2015-03-04T19:38:35Z

(Replaces #1873)

Based on #2032 (adds EmbedLayer -- not needed for, but often used with RNNs in practice, and is needed for my examples), which in turn is based on #1977.

This adds an abstract class RecurrentLayer intended to support recurrent architectures (RNNs, LSTMs, etc.) using an internal network unrolled in time. RecurrentLayer implementations (here, just RNNLayer and LSTMLayer) specify the recurrent architecture by filling in a NetParameter with appropriate layers.

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n. Under the hood, the previous timestep's hidden state is multiplied by these delta values. The fact that these indicators are specified on a per-timestep and per-stream basis allows for streams of arbitrary different lengths without any padding or truncation. At the beginning of the forward pass, the final hidden state from the previous forward pass (h_T) is copied into the initial hidden state for the new forward pass (h_0), allowing for exact inference across arbitrarily long sequences, even if T == 1. However, if any sequences cross batch boundaries, backpropagation through time is approximate -- it is truncated along the batch boundaries.

Note that the T x N arrangement in memory, used for computational efficiency, is somewhat counterintuitive, as it requires one to "interleave" the data streams.

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

I've added scripts to download COCO2014 (and splits), and prototxts for training a language model and LRCN captioning model on the data. From the Caffe root directory, you should be able to download and parse the data by doing:

cd data/coco
./get_coco_aux.sh # download train/val/test splits
./download_tools.sh # download official COCO tool
cd tools
python setup.py install # follow instructions to install tools and download COCO data if needed
cd ../../.. # back to caffe root
./examples/coco_caption/coco_to_hdf5_data.py

Then, you can train a language model using ./examples/coco_caption/train_language_model.sh, or train LRCN for captioning using ./examples/coco_caption/train_lrcn.sh (assuming you have downloaded models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel).

Still on the TODO list: upload a pretrained model to the zoo; add a tool to preview generated image captions and compute retrieval & generation scores.

cvondrick · 2015-03-20T17:41:17Z

Firstly, thanks for the fantastic code. I had been playing with my own LSTM, and found this PR, and it is above and beyond any of my own attempts. Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the ReshapeLayer will produce all zeros instead of actually copying the data. I've created a minimal test case that shows this failure for this PR:

# Load a random dataset 
layer {
  name: "ToyData_1"
  type: "DummyData"
  top: "ToyData_1"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "gaussian"
      std: 1
    }
  }
}

layer {
  name: "ZeroData"
  type: "DummyData"
  top: "ZeroData"
  dummy_data_param {
    shape {
      dim: 101 
      dim: 7
      dim: 3
    }
    data_filler {
      type: "constant"
      value: 0
    }
  }
}


# Reshape ToyData_1 to be the same size
layer {
  name: "Reshape"
  type: "Reshape"
  bottom: "ToyData_1"
  top: "ToyData_2"
  reshape_param {
    shape {
      dim: 101
      dim: 7
      dim: 3
    }
  }
}





# Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ToyData_2"
  top: "ToyData_1_vs_2_Difference"
  type: "EuclideanLoss"
}

# We expect this loss to be non-zero, and it is non-zero. 
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_1"
  bottom: "ZeroData"
  top: "ToyData_1_vs_Zero_Difference"
  type: "EuclideanLoss"
}

# Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.
layer {
  name: "EuclideanLoss"
  bottom: "ToyData_2"
  bottom: "ZeroData"
  top: "ToyData_2_vs_Zero_Difference"
  type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the exact same size (identity) to create ToyData_2. We would expect that || ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero. Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note that, as expected, the loss between ToyData_1 and all zeros is non-zero.

It seems there is a bug with reshape. I've fixed it here by copying an older version of Reshape into this branch: https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But, hope this bug reports helps.

The same issue occurs in #2088

Carl

jeffdonahue · 2015-03-20T19:54:12Z

Well that's disturbing... I don't have time to look into it now but thanks
a lot for reporting Carl! Will follow up when I've figured something out.
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl

—
Reply to this email directly or view it on GitHub
#2033 (comment).

jeffdonahue · 2015-03-20T19:59:08Z

Oops, failed to read to the end and see that you already had a fix. Thanks
for posting the fix! (I think the current version of my reshapelayer PR
may do what your fix does, in which case I'll just rebase this onto that PR
as I should anyway.)
On Mar 20, 2015 10:41 AM, "Carl Vondrick" notifications@github.com wrote:

Firstly, thanks for the fantastic code. I had been playing with my own
LSTM, and found this PR, and it is above and beyond any of my own attempts.
Really nice job.

There seems to be a bug in the ReshapeLayer of this PR. In some cases, the
ReshapeLayer will produce all zeros instead of actually copying the data.
I've created a minimal test case that shows this failure for this PR:

Load a random dataset

layer {
name: "ToyData_1"
type: "DummyData"
top: "ToyData_1"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "gaussian"
std: 1
}
}
}

layer {
name: "ZeroData"
type: "DummyData"
top: "ZeroData"
dummy_data_param {
shape {
dim: 101
dim: 7
dim: 3
}
data_filler {
type: "constant"
value: 0
}
}
}

Reshape ToyData_1 to be the same size

layer {
name: "Reshape"
type: "Reshape"
bottom: "ToyData_1"
top: "ToyData_2"
reshape_param {
shape {
dim: 101
dim: 7
dim: 3
}
}
}

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ToyData_2"
top: "ToyData_1_vs_2_Difference"
type: "EuclideanLoss"
}

We expect this loss to be non-zero, and it is non-zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_1"
bottom: "ZeroData"
top: "ToyData_1_vs_Zero_Difference"
type: "EuclideanLoss"
}

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

layer {
name: "EuclideanLoss"
bottom: "ToyData_2"
bottom: "ZeroData"
top: "ToyData_2_vs_Zero_Difference"
type: "EuclideanLoss"
}

Above, it loads a random dataset, ToyData_1. It then reshapes it to the
exact same size (identity) to create ToyData_2. We would expect that ||
ToyData_1 - ToyData_2 ||_2 == 0

However, if you train with the above model on this branch, you will see
that the Euclidean loss between ToyData_1 and ToyData_2 is non-zero.
Moreover, the loss between ToyData_2 and a blob of all zeros is zero. Note
that, as expected, the loss between ToyData_1 and all zeros is zero.

It seems there is a bug with reshape. I've fixed it here by copying an
older version of Reshape into this branch: cvondrick@3e1a0ff
https://github.com/cvondrick/caffe/commit/3e1a0ff73fef23b8cb8adc8223e0bb2c8900e56b

Unfortunately, I didn't have time to write a real unit test for this. But,
hope this bug reports helps.

Carl

—
Reply to this email directly or view it on GitHub
#2033 (comment).

cvondrick · 2015-03-20T20:06:51Z

Thanks Jeff -- yeah, we fixed it by copying a ReshapeLayer from somewhere else. Unfortunately, we have lost track of exactly where that layer came from, but I'm sure somebody here (maybe even you) wrote it at some point.

hf · 2015-03-24T11:49:33Z

When is this feature going to be ready? Is there something to be done?

thuyen · 2015-03-24T14:43:16Z

For the captioning model, can anyone show me how to generate captions after the training is done? Current LSTM layers process the whole input sequence (20 words in the coco example) across time, but we need to generate one by one at each time step (current time step is the input to the next).

vadimkantorov · 2015-03-24T15:52:32Z

I've just tried to run train_lcrn.sh (after running coco_to_hdf5_data.py and other scripts) and I get a "dimensions don't match" error:

F0324 16:37:24.435840 20612 eltwise_layer.cpp:51] Check failed: bottom[i]->shape() == bottom[0]->shape()

The stack-trace and log are here: http://pastebin.com/fWUxsSmv

I've uncommented line 471 in net.cpp to find the faulty layer (the only modification). It seems it happens in lstm2 which blends input from the language model and from the image CNN.

train_language_model.sh runs fine without errors.

Ideas?

ih4cku · 2015-03-24T16:59:10Z

By the way, does Caffe's recurrent layer support bi-directional RNN?

vadimkantorov · 2015-03-24T19:25:00Z

Both factored and unfactored setups are concerned. Seems there are some dimensions problems while blending CNN input with embedded text input.

aksarben09 · 2015-03-25T01:06:37Z

I have the same question as @thuyen. My understanding is that the current unrolled architecture slices an input sentence and feed the resulting words to each time step at once. So, for both train and test nets, the ground truth sentences are fed to the unrolled net. However, for captioning an image, there is no sentence to give to the net. But I don't think it is correct to give the start symbol to each layer. Did I miss anything?

aksarben09 · 2015-03-25T07:18:06Z

The dimension check fails for the static input (the image feature) with size 100_4000 vs 1_100*4000. It seems to be caused by Reshape layer; @cvondrick 's fix seems to solve this.

jeffdonahue · 2015-03-25T07:28:57Z

Yes, as noted by @cvondrick, this works with the older version of the ReshapeLayer which puts everything in Reshape (as opposed to the newer one that uses LayerSetUp -- see discussion with @longjon in #2088). I don't yet have any idea why the Reshape version would work but the LayerSetUp version wouldn't, but I've just force pushed a new version of this branch that uses the previous ReshapeLayer version, and confirmed that both example scripts (train_lrcn.sh & train_language_model.sh) run. Sorry for breaking the LRCN one.

jeffdonahue · 2015-03-25T07:33:33Z

By the way, does Caffe's recurrent layer support bi-directional RNN?

You can create a bi-directional RNN using 2 RNN layers and feeding one the input in forward order and the other the input in backward order, and fusing their per-timestep outputs however you like (e.g. eltwise sum or concatenation).

vadimkantorov · 2015-03-25T17:41:04Z

Thanks @jeffdonahue , training lrcn now works! Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

read-mind · 2015-03-27T06:08:19Z

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

jeffdonahue · 2015-03-30T19:31:29Z

As @jeffdonahue mentioned, bidirectinal RNN can be built with two RNNs, it's easy to prepare reversed input sequence, but how to reverse the output of one RNN when fusing two RNN outputs in Caffe? It seems no layer does the reverse.

True; one would need to implement an additional layer to do the reversal. You'd also need to be careful to ensure that your instances do not cross batch boundaries (as is allowed by my implementation as it works fine for unidirectional architectures) since inference at each timestep is dependent on all other timesteps in a bidirectional RNN.

Same question as @thuyen @ritsu1228. Does anyone have an idea how to hook up to when the first word after the start symbol gets produced and put the next symbol on the input_sentence tensor in memory before the next round of unrolled net will get to run?

In the not-too-distant future I'll add code for evaluation, including using the model's own predictions as input in future timesteps as you mention.

jeffdonahue · 2015-03-30T19:35:59Z

I've also gotten a number of questions on the optional third input to RecurrentLayer -- I've added some clarification in the original post:

There is also an optional third input whose dimensions are simply N x ... (i.e. the first axis must have dimension N and the others can be anything) which is a "static" input to the LSTM. It's equivalent to (but more efficient than) copying the input across the T timesteps and concatenating it with the "dynamic" first input (I was using my TileLayer -- #2083 -- for this purpose at some point before adding the static input). It's used in my captioning experiments to input the image features as they don't change over time. For most problems there will be no such "static" input and you should simply ignore it and just specify the first two input blobs.

liqing-ustc · 2015-03-31T07:37:11Z

Thanks for the fantastic code. But the code of the Reshape function in Recurrent layer makes me confused. when passing data from "output_blobs_" to "top blobs", why it is

    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);

rather than

    top[i]->ShareData(*output_blobs_[i]);
    top[i]->ShareDiff(*output_blobs_[i]);

it seems that the top blobs is just reshaped and empty.

the original code is here:

template <typename Dtype>
void RecurrentLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  CHECK_EQ(top.size(), output_blobs_.size());
  for (int i = 0; i < top.size(); ++i) {
    top[i]->ReshapeLike(*output_blobs_[i]);
    output_blobs_[i]->ShareData(*top[i]);
    output_blobs_[i]->ShareDiff(*top[i]);
  }
  x_input_blob_->ShareData(*bottom[0]);
  x_input_blob_->ShareDiff(*bottom[0]);
  cont_input_blob_->ShareData(*bottom[1]);
  if (static_input_) {
    x_static_input_blob_->ShareData(*bottom[2]);
    x_static_input_blob_->ShareDiff(*bottom[2]);
  }
}

jeffdonahue · 2016-03-15T07:46:28Z

Ah, I didn't know the HDF5OutputLayer worked that way, I see... sounds a little scary, but might work... good luck!

fl2o · 2016-03-15T12:58:52Z

@shaibagon Thanks for the hightlight but I struggle to see how to handle signals with different lengths (ie timestep) for the training process using NetSpec? I can't change my unrolled net architecture during training...
Should I use a very long LSTM and stop the forward pass after I have reached the end of the signal being processed then start the backward pass?

shaibagon · 2016-03-15T13:29:31Z

@fl2o AFAIK, if you want exact backprop for recurrent nets in caffe, there's no way around explicitly unrolling the net across ALL time steps.
However, if you use @jeffdonahue 's recurrent branch, you will be able to achieve exact forward estimation, and backprop exact to the limit of the temporal batch size. This can alleviate the need to explicitly unroll very long temporal nets.

Regarding working with very long sequences:

You may define maxT and explicitly unroll your net to span maxT time steps, padding shorter sequences with some "null" data/label.
Since caffe uses SGD for training, it is better to have more than one sequence participating in a forward-backward pass (i.e., mini-batch of size > 1). Otherwise gradient estimation will be very noisy.

Can you afford all these bolbs in memory at once?

shaibagon · 2016-03-15T13:30:38Z

@jeffdonahue BTW, is there a reason why this PR is not merged into master?

fl2o · 2016-03-15T13:53:45Z

@shaibagon I am gonna try padding shorter sequences with some "null" data/label (Should I use a special term or just 0 ?) in order to avoid the gradient estimation problem, but I am not sure yet about the memory issue..! (maxT will be around 400! while minT ~50)

shaibagon · 2016-03-15T14:09:05Z

@fl2o I'm not certain just using 0 is enough. You want no gradients to be computed from these padded time steps. You might need to have an "ignore_label" and implement your loss layer to support "ignore_label".
Make sure no gradients from the padded time steps are propagated into the "real" time steps

fl2o · 2016-03-15T14:19:49Z

That's what I was wondering ....
Wonder if it's not "easier" to use this PR directly ^^
Gonna figure it out! Thank you @shaibagon

shaibagon · 2016-03-15T14:24:18Z

@fl2o in the future, I think it would be best to keep this github issue thread for PR related comments only. For more general inquires and questions about LSTM in Caffe, it might be better to ask a question in stackoverflow.

chriss2401 · 2016-03-15T19:16:51Z

@shaibagon Cheers for all the helpful comments.

lood339 · 2016-04-11T22:57:18Z

Hi, I used the LRCN code to generate captions form an image. I replace the alexNet with google net. The result likes this:
"A brown cat sitting top top top top ...."
The sentence repeats the word "top" a lot. Is there any reasons?
I also tried other modifications. It seams the LSTM is very sensitive to the learning parameters. Is this conclusion right in general?
Thanks.

yangfly · 2016-04-14T07:22:12Z

examples/coco_caption/hdf5_sequence_generator.py

+        print ('Exhausted all data; cutting off batch at timestep %d ' +
+               'with %d streams completed') % (t, num_completed_streams)
+        for name in self.substream_names:
+          batch[name] = batch[name][:t, :]


words at timestep t might not be deleted:
batch[name] = batch[name][:(t+1), :]

liminchen · 2016-04-16T17:39:20Z

Could anyone tell me what's the difference between C_diff and C_term_diff in the backward_cpu function? I'm trying to understand the code and write a GRU version. Thanks in advance!
`template
void GRUUnitLayer::Backward_cpu(const vector<Blob>& top,
const vector& propagate_down, const vector<Blob>& bottom) {
CHECK(!propagate_down[2]) << "Cannot backpropagate to sequence indicators.";
if (!propagate_down[0] && !propagate_down[1]) { return; }

const int num = bottom[0]->shape(1);
const int x_dim = hidden_dim_ * 4;
const Dtype* C_prev = bottom[0]->cpu_data();
const Dtype* X = bottom[1]->cpu_data();
const Dtype* flush = bottom[2]->cpu_data();
const Dtype* C = top[0]->cpu_data();
const Dtype* H = top[1]->cpu_data();
const Dtype* C_diff = top[0]->cpu_diff();
const Dtype* H_diff = top[1]->cpu_diff();
Dtype* C_prev_diff = bottom[0]->mutable_cpu_diff();
Dtype* X_diff = bottom[1]->mutable_cpu_diff();
for (int n = 0; n < num; ++n) {
for (int d = 0; d < hidden_dim_; ++d) {
const Dtype i = sigmoid(X[d]);
const Dtype f = (flush == 0) ? 0 :
(flush * sigmoid(X[1 * hidden_dim + d]));
const Dtype o = sigmoid(X[2 * hidden_dim + d]);
const Dtype g = tanh(X[3 * hidden_dim_ + d]);
const Dtype c_prev = C_prev[d];
const Dtype c = C[d];
const Dtype tanh_c = tanh(c);
Dtype* c_prev_diff = C_prev_diff + d;
Dtype* i_diff = X_diff + d;
Dtype* f_diff = X_diff + 1 * hidden_dim_ + d;
Dtype* o_diff = X_diff + 2 * hidden_dim_ + d;
Dtype* g_diff = X_diff + 3 * hidden_dim_ + d;
const Dtype c_term_diff =
C_diff[d] + H_diff[d] * o * (1 - tanh_c * tanh_c);
*c_prev_diff = c_term_diff * f;
*i_diff = c_term_diff * g * i * (1 - i);
*f_diff = c_term_diff * c_prev * f * (1 - f);
*o_diff = H_diff[d] * tanh_c * o * (1 - o);
*g_diff = c_term_diff * i * (1 - g * g);
}
C_prev += hidden_dim_;
X += x_dim;
C += hidden_dim_;
H += hidden_dim_;
C_diff += hidden_dim_;
H_diff += hidden_dim_;
X_diff += x_dim;
C_prev_diff += hidden_dim_;
++flush;
}
}`

maydaygmail · 2016-04-29T09:53:51Z

@jeffdonahue captioner.py for generating sentence, to generate the current word, captioner.py only use the previous one word not all the previous words?

anguyen8 · 2016-05-03T03:45:57Z

Does any know if there is a pre-trained image captioning LRCN model out there? I'd greatly appreciate if this is included in the Model Zoo.

@jeffdonahue : would you be able to release the model from your CVPR'15 paper?

anteagle · 2016-05-16T15:26:42Z

Has this branch been landed to the master ? The layers are in the master, but it seems the examples are not there. Could anyone point to me to the right way to get this branch ? I did git pull #2033, but just showed Already up-to-date.

shaibagon · 2016-05-16T15:30:07Z

@anteagle it seems like the PR only contained the LSTM RNN layers and not the examples (too much to review). You'll have to go to Jeff Donahue's "recurrent" branch.

anteagle · 2016-05-17T00:01:32Z

@shaibagon thanks, I got from Jeff's repo, though it has not been updated for a while.

jeffdonahue · 2016-06-03T03:21:28Z

Closing with the merge of #3948 -- though this PR still contains examples that PR lacked, and I should eventually restore and rebase those on the now merged version. In the meantime I'll keep my recurrent branch (and other mentioned branches) open and in their current form for reference.

yangzhikai · 2017-05-10T11:35:28Z

hello,I have a question .When I read the file 'lstm_layer.cpp',I find a lot of 'add_top','add_bottom','add_dim',but I can't find the definition of them in caffe folder.Could you tell me where can I them and whats the meaning of the code such as 'add_bottom("c_" + tm1s);'.

jeffdonahue · 2017-05-10T21:15:43Z

The methods you refer to are all automatically generated by protobuf. See caffe.proto for the declarations of top, bottom, etc., which result in the protobuf compiler automatically generating the add_top, add_bottom methods. (The resulting C definitions are in the protobuf-generated header file caffe.pb.h.)

yangzhikai · 2017-05-11T12:57:40Z

oh , Thank you very much. I have not find this file(caffe.pb.h) because I haven't complied it before!

soulslicer · 2018-03-07T03:54:51Z

Hi, is there any working example of the layer in caffe?

cuixing158 · 2018-06-15T05:08:10Z

The same question, is there any working example of the layer in caffe?

jianjieluo · 2018-10-29T09:06:34Z

@cuixing158 @soulslicer jeffdonahue's example for coco image caption task.

#2033 (comment)

Go for his caffe branch and you will find the example .prototxt files and others which may be helpful.

jeffdonahue force-pushed the recurrent branch from 54fa90f to e90a166 Compare March 6, 2015 05:49

shelhamer added JL ES labels Mar 7, 2015

jeffdonahue mentioned this pull request Mar 10, 2015

Very simple version of ReshapeLayer #2088

Closed

jeffdonahue force-pushed the recurrent branch 4 times, most recently from af9f11d to 34230c6 Compare March 13, 2015 20:14

cvondrick mentioned this pull request Mar 20, 2015

Fix bug in recurrent branch jeffdonahue/caffe#6

Open

jeffdonahue force-pushed the recurrent branch from 34230c6 to 1b50f7b Compare March 25, 2015 07:24

jeffdonahue force-pushed the recurrent branch 2 times, most recently from d3ebf3e to 80e9c41 Compare March 26, 2015 21:13

jeffdonahue mentioned this pull request Apr 5, 2016

RNN + LSTM Layers #3948

Merged

yangfly reviewed Apr 14, 2016
View reviewed changes

jeffdonahue closed this Jun 3, 2016

aurotripathy mentioned this pull request Jul 27, 2016

Use of default clip markers as [0,1,1, 1...,1] junhyukoh/caffe-lstm#14

Closed

aurotripathy mentioned this pull request Aug 5, 2016

Getting 'Check failed' error when loading a toy LSTM prototxt file #4547

Closed

beniz mentioned this pull request Sep 21, 2016

Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) jolibrain/deepdetect#140

Open

Unrolled recurrent layers (RNN, LSTM) #2033

Unrolled recurrent layers (RNN, LSTM) #2033

Conversation

jeffdonahue commented Mar 4, 2015

cvondrick commented Mar 20, 2015

jeffdonahue commented Mar 20, 2015

Load a random dataset

Reshape ToyData_1 to be the same size

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

We expect this loss to be non-zero, and it is non-zero.

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

jeffdonahue commented Mar 20, 2015

Load a random dataset

Reshape ToyData_1 to be the same size

Since ToyData_1 should be ToyData_2, we expect that this loss be zero. But, it is not.

We expect this loss to be non-zero, and it is non-zero.

Since ToyData_2 should be ToyData_1 and ToyData_1 is non-zero, we expect this loss to be non-zero. But, it is zero.

cvondrick commented Mar 20, 2015

hf commented Mar 24, 2015

thuyen commented Mar 24, 2015

vadimkantorov commented Mar 24, 2015

ih4cku commented Mar 24, 2015

vadimkantorov commented Mar 24, 2015

aksarben09 commented Mar 25, 2015

aksarben09 commented Mar 25, 2015

jeffdonahue commented Mar 25, 2015

jeffdonahue commented Mar 25, 2015

vadimkantorov commented Mar 25, 2015

read-mind commented Mar 27, 2015

jeffdonahue commented Mar 30, 2015

jeffdonahue commented Mar 30, 2015

liqing-ustc commented Mar 31, 2015

jeffdonahue commented Mar 15, 2016

fl2o commented Mar 15, 2016

shaibagon commented Mar 15, 2016

shaibagon commented Mar 15, 2016

fl2o commented Mar 15, 2016

shaibagon commented Mar 15, 2016

fl2o commented Mar 15, 2016

shaibagon commented Mar 15, 2016

chriss2401 commented Mar 15, 2016

lood339 commented Apr 11, 2016

yangfly Apr 14, 2016

Choose a reason for hiding this comment

liminchen commented Apr 16, 2016

maydaygmail commented Apr 29, 2016

anguyen8 commented May 3, 2016

anteagle commented May 16, 2016

shaibagon commented May 16, 2016

anteagle commented May 17, 2016

jeffdonahue commented Jun 3, 2016

yangzhikai commented May 10, 2017

jeffdonahue commented May 10, 2017 • edited Loading

yangzhikai commented May 11, 2017

soulslicer commented Mar 7, 2018

cuixing158 commented Jun 15, 2018

jianjieluo commented Oct 29, 2018

jeffdonahue commented May 10, 2017 •

edited

Loading