Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

position-sensitive attention #17

Open
fmu2 opened this issue Nov 16, 2020 · 1 comment
Open

position-sensitive attention #17

fmu2 opened this issue Nov 16, 2020 · 1 comment

Comments

@fmu2
Copy link

fmu2 commented Nov 16, 2020

Thanks for the great work!

I am a bit confused about this piece of code:

kr = torch.einsum('bgci,cij->bgij', k, k_embedding).transpose(2, 3)

According to Eq. 4 in the paper, I have the impression that it should be torch.einsum('bgcj,cij->bgij', k, k_embedding) since p is the varying index. Please correct me if I am wrong. Thanks!

@phj128
Copy link
Collaborator

phj128 commented Nov 17, 2020

This depends on the varying axis of the embedding you chooce, due to the two axis of the embedding here are two different directions, but both relative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants