Why is attention applied on the outputs instead of hidden states? #11

prerit2010 · 2019-06-01T16:52:36Z

As mentioned in the paper, the attention is to be applied on the hidden states of the LSTM, but in the code, it is done on the outputs instead of hidden states. Why is it like that ?

YooSungHyun · 2022-01-24T05:52:29Z

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

in pytorch, we can get output that is time sequnece T`s hidden value
So, that is same return_sequences options in tensorflow 2.x

so, we have to use pytorch`s lstm output sequence on attention

we can test in source code like,
emb = self.drop(self.encoder(inp))
print("embed_size : ", emb.size())
outp,test = self.bilstm(emb, hidden)
print("bilstm_output_size : ", outp.size())
print("hiddens, cell size : ", test[0].size(), test[1].size())
on models.py -> class BiLSTM -> forward

be useful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is attention applied on the outputs instead of hidden states? #11

Why is attention applied on the outputs instead of hidden states? #11

prerit2010 commented Jun 1, 2019

YooSungHyun commented Jan 24, 2022

Why is attention applied on the outputs instead of hidden states? #11

Why is attention applied on the outputs instead of hidden states? #11

Comments

prerit2010 commented Jun 1, 2019

YooSungHyun commented Jan 24, 2022