-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Implements Encoder-Decoder Attention Model #28
base: main
Are you sure you want to change the base?
Conversation
I would recommend to build full setups before merging this to avoid the same problems we had with other code that we merged but never "used" before, I would help with this. |
Yeah I agree. |
44b9cdc
to
6a6e044
Compare
Allows to pass the label unshifted for step-wise search without needing a separate function besides "forward".
b26ed0a
to
6a147b7
Compare
I completely forgot to test with this branch again after using it already for some time, did so now and it works normally. |
Co-authored-by: Benedikt Hilmes <hilmes@hltpr.rwth-aachen.de>
energies = v^T * tanh(h + s + beta) where beta is weight feedback information | ||
weights = softmax(energies) | ||
context = sum_t weights_t * h_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symbols in this docstring are partly undefined/different to the parameter names in forward
. It would be easier to understand if the naming was unified.
:param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last | ||
training: this should be "True", in order to start with a zero target embedding | ||
search: use True for the first step in order to start with a zero embedding, False otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this shift_embeddings
logic. I would rather handle this externally by prepending a begin-token to labels
or using the begin-token in the first search step. If the embedding must be an all-zero vector this could be achieved via the padding_idx
parameter in torch.nn.Embedding
.
:param shift_embeddings: shift the embeddings by one position along U, padding with zero in front and drop last | ||
training: this should be "True", in order to start with a zero target embedding | ||
search: use True for the first step in order to start with a zero embedding, False otherwise | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs for the return values are missing.
training: labels of shape [B,N] | ||
(greedy-)search: hypotheses last label as [B,1] | ||
:param enc_seq_len: encoder sequence lengths of shape [B,T], same for training and search | ||
:param state: decoder state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shape info for state tensors is missing.
def forward( | ||
self, inputs: torch.Tensor, state: Tuple[torch.Tensor, torch.Tensor] | ||
) -> Tuple[torch.Tensor, torch.Tensor]: | ||
with torch.autocast(device_type="cuda", enabled=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this disabled here? That should be explained in the code, maybe with some ref.
No description provided.