why is the attention_mask's shape (tl, tl) #30

rabbicat30 · 2023-02-15T12:43:08Z

tl = seqs.shape[1]  # time dim len for enforce causality
    
attention_mask = ~torch.tril(torch.ones((tl, tl), dtype=torch.bool, device=self.dev))

I can't understand why the attention_mask is this shape. Can you give me an answer or some references? I would be very grateful for your help!

The text was updated successfully, but these errors were encountered:

seanswyi · 2023-02-17T01:33:34Z

You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.

rabbicat30 · 2023-02-21T07:21:31Z

I know it. Thanks very much!

You should look at the original Transformer paper and other blog posts (e.g., The Illustrated Transformer is great) for some more information. The reason is because in self-attention we're performing attention on a tensor with itself, hence the square shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is the attention_mask's shape (tl, tl) #30

why is the attention_mask's shape (tl, tl) #30

rabbicat30 commented Feb 15, 2023

seanswyi commented Feb 17, 2023

rabbicat30 commented Feb 21, 2023

why is the attention_mask's shape (tl, tl) #30

why is the attention_mask's shape (tl, tl) #30

Comments

rabbicat30 commented Feb 15, 2023

seanswyi commented Feb 17, 2023

rabbicat30 commented Feb 21, 2023