LSTM different input and output lengths #4

shaimaahegazy · 2016-10-21T16:57:57Z

I want to create a network with an LSTM layer, where the number of outputs from the LSTM layer is different from the number of its inputs. Is this possible?

joncox123 · 2016-10-21T17:37:12Z

Yes, there are a couple of ways to go about this. The first way is already implemented and part of the RNN example. There is a "Tos" (offset time) parameter that allows you to set an initial time before the output of the LSTM is factored into the cost function.

Say your input has length T=10 and your output has length To = 15. Then you would set Tos = T-To = 5 (I might be off by a constant factor of +/- 1 or 2). As for your output /label (y) cell array, I think you would just zero pad the first Tos outputs (you could use a sparse matrix).

The second method is more state of the art. You can take two LSTMs and stack them together. I don't have an example for this, but it should be relatively straight forward to modify the code to do this. The first LSTM takes the input and produces no output until the last time step (when it has received the entire input sequence). Then, you take this final output and stick it into a second LSTM, which then unrolls this fixed, initial input into an output sequence of a different length.

To do this, you can extract the hidden state values or the LSTM output (e.g. LSTM cell state) from LSTM 1 and inject it as the initial cell state of LSTM 2. In this case, LSTM 1 and LSTM 2 should have the same number of units in the layer. You'll have to modify backprop a bit to account for this unimplemented method of connecting one LSTM's output to another's hidden state.

The other way is connect LSTM 1 to LSTM 2 via a fully connected layer that transforms the dimensions and inputs to LSTM as a standard input on the first time step. This is a little more complicated.

http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces.pdf

shaimaahegazy · 2016-10-21T18:31:31Z

I am extremely thankful for the detailed reply.

Actually what I meant was to perform some kind of sequence labeling, so for example I have a 30 s sequence of sensor data (each sequence contains 10 features), and I need to map this into one of two states. Would this be feasible using a single LSTM?

joncox123 · 2016-10-21T21:13:10Z

OK, so binary classification, for example, on a sequence. Yes, there are two ways to do this. One way, which is not ideal (but works), is just to set Tos = T - 1. In this way, the LSTM outputs the classification on the final time step. Alternatively, you could set Tos=0 and ask the LSTM to predict the same thing for every step of the entire sequence.

The better way, which I highly recommend, is to implement temporal average pooling. I don't have the code for this implemented, but you sum up and average the output of the LSTM for every time step. Then the average is input to a softmax classifier after the final time step. You then would back-propagate by multiplying the error term (d) by the averaging factor, T.

shaimaahegazy · 2016-10-23T05:17:48Z

I am very thankful for your reply. I am currently trying the first path and will move to the second way as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM different input and output lengths #4

LSTM different input and output lengths #4

shaimaahegazy commented Oct 21, 2016

joncox123 commented Oct 21, 2016 •

edited

Loading

shaimaahegazy commented Oct 21, 2016

joncox123 commented Oct 21, 2016

shaimaahegazy commented Oct 23, 2016

LSTM different input and output lengths #4

LSTM different input and output lengths #4

Comments

shaimaahegazy commented Oct 21, 2016

joncox123 commented Oct 21, 2016 • edited Loading

shaimaahegazy commented Oct 21, 2016

joncox123 commented Oct 21, 2016

shaimaahegazy commented Oct 23, 2016

joncox123 commented Oct 21, 2016 •

edited

Loading