Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plans on implementing an external mask #9

Open
gregkoytiger opened this issue Feb 28, 2019 · 1 comment
Open

Plans on implementing an external mask #9

gregkoytiger opened this issue Feb 28, 2019 · 1 comment

Comments

@gregkoytiger
Copy link

Great work on this code! One feature of the transformer models typically is to use a mask to handle variable length input sequences such as in https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/042ce3846b80dcebb169c856f378bfe26a18c6e4/transformer.py#L89

Is there any plan to implement this functionality?

@kpot
Copy link
Owner

kpot commented Mar 4, 2019

Hi! Yes, that is a reasonable feature. However, it currently has a low priority, since I currently don't have much time and a similar result can be achieved by introducing a special "pad" word into the vocabulary (assuming you're using the transformer for an NLP problem), then replacing all "unused" elements of the sequence with it, and letting the network itself learn an embedding that "will never be focused upon".

I understand this is not exactly the same thing as masking and requires the introduction of this special word during the training. If masking is critical for your needs, feel free to make the necessary changes yourself and send a pull request (with an example of how they are supposed to be used). I'll review and merge the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants