You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.
You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.
I can't understand why the attention_mask is this shape. Can you give me an answer or some references? I would be very grateful for your help!
The text was updated successfully, but these errors were encountered: