Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is attention applied on the outputs instead of hidden states? #11

Open
prerit2010 opened this issue Jun 1, 2019 · 1 comment
Open

Comments

@prerit2010
Copy link

As mentioned in the paper, the attention is to be applied on the hidden states of the LSTM, but in the code, it is done on the outputs instead of hidden states. Why is it like that ?

@YooSungHyun
Copy link

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

in pytorch, we can get output that is time sequnece T`s hidden value
So, that is same return_sequences options in tensorflow 2.x

so, we have to use pytorch`s lstm output sequence on attention

we can test in source code like,
emb = self.drop(self.encoder(inp))
print("embed_size : ", emb.size())
outp,test = self.bilstm(emb, hidden)
print("bilstm_output_size : ", outp.size())
print("hiddens, cell size : ", test[0].size(), test[1].size())
on models.py -> class BiLSTM -> forward

be useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants