You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Yes, that is a reasonable feature. However, it currently has a low priority, since I currently don't have much time and a similar result can be achieved by introducing a special "pad" word into the vocabulary (assuming you're using the transformer for an NLP problem), then replacing all "unused" elements of the sequence with it, and letting the network itself learn an embedding that "will never be focused upon".
I understand this is not exactly the same thing as masking and requires the introduction of this special word during the training. If masking is critical for your needs, feel free to make the necessary changes yourself and send a pull request (with an example of how they are supposed to be used). I'll review and merge the changes.
Great work on this code! One feature of the transformer models typically is to use a mask to handle variable length input sequences such as in https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/042ce3846b80dcebb169c856f378bfe26a18c6e4/transformer.py#L89
Is there any plan to implement this functionality?
The text was updated successfully, but these errors were encountered: