🐛 Bug in attention head mask for cross-attention module in encoder-decoder models #10540

stancld · 2021-03-05T10:00:33Z

Currently, encoder-decoder models use either head_mask or decoder_head_mask for masking attention heads in cross-attention modules. Both cases are not perfectly correct. Furthermore, MHA in cross-attention modules shares the parameters with the decoder, i.e. shape = (decoder.num_layers, decoder.num_attention_heads), therefore, the usage of encoder head_mask in the cross-attention module may lead to errors due to the shape mismatch.

My contribution: I will take care of this issue this weekend.

Reviewers: @patil-suraj @patrickvonplaten

The text was updated successfully, but these errors were encountered:

github-actions · 2021-04-14T15:02:20Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patil-suraj · 2021-04-15T05:06:20Z

Unstale

patrickvonplaten · 2021-04-22T11:36:54Z

Hey @stancld,

Sorry for being so unresponsive here - I'm happy to change the behavior and provide 3 masks

stancld changed the title ~~🐛 Fix attention head mask for cross-attention module in encoder-decoder models~~ 🐛 Bug in attention head mask for cross-attention module in encoder-decoder models Mar 5, 2021

stancld mentioned this issue Mar 9, 2021

Fix cross-attention head mask for Torch encoder-decoder models #10605

Merged

patrickvonplaten closed this as completed in #10605 Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Bug in attention head mask for cross-attention module in encoder-decoder models #10540

🐛 Bug in attention head mask for cross-attention module in encoder-decoder models #10540

stancld commented Mar 5, 2021

github-actions bot commented Apr 14, 2021

patil-suraj commented Apr 15, 2021

patrickvonplaten commented Apr 22, 2021

🐛 Bug in attention head mask for cross-attention module in encoder-decoder models #10540

🐛 Bug in attention head mask for cross-attention module in encoder-decoder models #10540

Comments

stancld commented Mar 5, 2021

github-actions bot commented Apr 14, 2021

patil-suraj commented Apr 15, 2021

patrickvonplaten commented Apr 22, 2021