Plans on implementing an external mask #9

gregkoytiger · 2019-02-28T17:36:54Z

Great work on this code! One feature of the transformer models typically is to use a mask to handle variable length input sequences such as in https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/042ce3846b80dcebb169c856f378bfe26a18c6e4/transformer.py#L89

Is there any plan to implement this functionality?

kpot · 2019-03-04T10:39:18Z

Hi! Yes, that is a reasonable feature. However, it currently has a low priority, since I currently don't have much time and a similar result can be achieved by introducing a special "pad" word into the vocabulary (assuming you're using the transformer for an NLP problem), then replacing all "unused" elements of the sequence with it, and letting the network itself learn an embedding that "will never be focused upon".

I understand this is not exactly the same thing as masking and requires the introduction of this special word during the training. If masking is critical for your needs, feel free to make the necessary changes yourself and send a pull request (with an example of how they are supposed to be used). I'll review and merge the changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plans on implementing an external mask #9

Plans on implementing an external mask #9

gregkoytiger commented Feb 28, 2019

kpot commented Mar 4, 2019 •

edited

Loading

Plans on implementing an external mask #9

Plans on implementing an external mask #9

Comments

gregkoytiger commented Feb 28, 2019

kpot commented Mar 4, 2019 • edited Loading

kpot commented Mar 4, 2019 •

edited

Loading