Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Implements Encoder-Decoder Attention Model #28

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

mmz33
Copy link
Member

@mmz33 mmz33 commented Jul 26, 2023

No description provided.

@JackTemaki
Copy link
Contributor

I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this.

@mmz33
Copy link
Member Author

mmz33 commented Jul 27, 2023

I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this.

Yeah I agree.

@mmz33 mmz33 requested a review from Atticus1806 July 31, 2023 15:05
@JackTemaki JackTemaki force-pushed the zeineldeen_att_decoder branch 2 times, most recently from 44b9cdc to 6a6e044 Compare August 2, 2023 09:34
@curufinwe curufinwe changed the title Implements Encoder-Decoder Attention Model Draft: Implements Encoder-Decoder Attention Model Aug 3, 2023
@curufinwe curufinwe marked this pull request as draft August 3, 2023 09:31
@JackTemaki
Copy link
Contributor

I completely forgot to test with this branch again after using it already for some time, did so now and it works normally.

@JackTemaki JackTemaki marked this pull request as ready for review October 21, 2024 18:43
i6_models/decoder/attention.py Outdated Show resolved Hide resolved
i6_models/decoder/zoneout_lstm.py Show resolved Hide resolved
Co-authored-by: Benedikt Hilmes <hilmes@hltpr.rwth-aachen.de>
Comment on lines +25 to +27
energies = v^T * tanh(h + s + beta) where beta is weight feedback information
weights = softmax(energies)
context = sum_t weights_t * h_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbols in this docstring are partly undefined/different to the parameter names in forward. It would be easier to understand if the naming was unified.

Comment on lines +146 to +148
:param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last
training: this should be "True", in order to start with a zero target embedding
search: use True for the first step in order to start with a zero embedding, False otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this shift_embeddings logic. I would rather handle this externally by prepending a begin-token to labels or using the begin-token in the first search step. If the embedding must be an all-zero vector this could be achieved via the padding_idx parameter in torch.nn.Embedding.

:param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last
training: this should be "True", in order to start with a zero target embedding
search: use True for the first step in order to start with a zero embedding, False otherwise
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs for the return values are missing.

training: labels of shape [B,N]
(greedy-)search: hypotheses last label as [B,1]
:param enc_seq_len: encoder sequence lengths of shape [B,T], same for training and search
:param state: decoder state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shape info for state tensors is missing.

def forward(
self, inputs: torch.Tensor, state: Tuple[torch.Tensor, torch.Tensor]
) -> Tuple[torch.Tensor, torch.Tensor]:
with torch.autocast(device_type="cuda", enabled=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this disabled here? That should be explained in the code, maybe with some ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants