Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is the attention_mask's shape (tl, tl) #30

Open
rabbicat30 opened this issue Feb 15, 2023 · 2 comments
Open

why is the attention_mask's shape (tl, tl) #30

rabbicat30 opened this issue Feb 15, 2023 · 2 comments

Comments

@rabbicat30
Copy link

tl = seqs.shape[1]  # time dim len for enforce causality
    
attention_mask = ~torch.tril(torch.ones((tl, tl), dtype=torch.bool, device=self.dev))

I can't understand why the attention_mask is this shape. Can you give me an answer or some references? I would be very grateful for your help!

@seanswyi
Copy link

You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.

@rabbicat30
Copy link
Author

I know it. Thanks very much!

You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants