position-sensitive attention #17

fmu2 · 2020-11-16T19:02:26Z

Thanks for the great work!

I am a bit confused about this piece of code:

Line 67 in fe1d052

kr = torch.einsum('bgci,cij->bgij', k, k_embedding).transpose(2, 3)

According to Eq. 4 in the paper, I have the impression that it should be torch.einsum('bgcj,cij->bgij', k, k_embedding) since p is the varying index. Please correct me if I am wrong. Thanks!

phj128 · 2020-11-17T02:36:00Z

This depends on the varying axis of the embedding you chooce, due to the two axis of the embedding here are two different directions, but both relative.

dongdongbh mentioned this issue Dec 14, 2020

Confused about the transpose in positional encoding of key #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

position-sensitive attention #17

position-sensitive attention #17

fmu2 commented Nov 16, 2020

phj128 commented Nov 17, 2020

position-sensitive attention #17

position-sensitive attention #17

Comments

fmu2 commented Nov 16, 2020

phj128 commented Nov 17, 2020