Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM different input and output lengths #4

Open
shaimaahegazy opened this issue Oct 21, 2016 · 4 comments
Open

LSTM different input and output lengths #4

shaimaahegazy opened this issue Oct 21, 2016 · 4 comments

Comments

@shaimaahegazy
Copy link

I want to create a network with an LSTM layer, where the number of outputs from the LSTM layer is different from the number of its inputs. Is this possible?

@joncox123
Copy link
Owner

joncox123 commented Oct 21, 2016

Yes, there are a couple of ways to go about this. The first way is already implemented and part of the RNN example. There is a "Tos" (offset time) parameter that allows you to set an initial time before the output of the LSTM is factored into the cost function.

Say your input has length T=10 and your output has length To = 15. Then you would set Tos = T-To = 5 (I might be off by a constant factor of +/- 1 or 2). As for your output /label (y) cell array, I think you would just zero pad the first Tos outputs (you could use a sparse matrix).

The second method is more state of the art. You can take two LSTMs and stack them together. I don't have an example for this, but it should be relatively straight forward to modify the code to do this. The first LSTM takes the input and produces no output until the last time step (when it has received the entire input sequence). Then, you take this final output and stick it into a second LSTM, which then unrolls this fixed, initial input into an output sequence of a different length.

To do this, you can extract the hidden state values or the LSTM output (e.g. LSTM cell state) from LSTM 1 and inject it as the initial cell state of LSTM 2. In this case, LSTM 1 and LSTM 2 should have the same number of units in the layer. You'll have to modify backprop a bit to account for this unimplemented method of connecting one LSTM's output to another's hidden state.

The other way is connect LSTM 1 to LSTM 2 via a fully connected layer that transforms the dimensions and inputs to LSTM as a standard input on the first time step. This is a little more complicated.

http://papers.nips.cc/paper/5346-information-based-learning-by-agents-in-unbounded-state-spaces.pdf

@shaimaahegazy
Copy link
Author

I am extremely thankful for the detailed reply.

Actually what I meant was to perform some kind of sequence labeling, so for example I have a 30 s sequence of sensor data (each sequence contains 10 features), and I need to map this into one of two states. Would this be feasible using a single LSTM?

@joncox123
Copy link
Owner

OK, so binary classification, for example, on a sequence. Yes, there are two ways to do this. One way, which is not ideal (but works), is just to set Tos = T - 1. In this way, the LSTM outputs the classification on the final time step. Alternatively, you could set Tos=0 and ask the LSTM to predict the same thing for every step of the entire sequence.

The better way, which I highly recommend, is to implement temporal average pooling. I don't have the code for this implemented, but you sum up and average the output of the LSTM for every time step. Then the average is input to a softmax classifier after the final time step. You then would back-propagate by multiplying the error term (d) by the averaging factor, T.

@shaimaahegazy
Copy link
Author

I am very thankful for your reply. I am currently trying the first path and will move to the second way as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants