Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape of relative position encoding r^q, r^k, r^v #29

Open
mcahny opened this issue May 4, 2021 · 1 comment
Open

Shape of relative position encoding r^q, r^k, r^v #29

mcahny opened this issue May 4, 2021 · 1 comment

Comments

@mcahny
Copy link

mcahny commented May 4, 2021

If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix r^q?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)

@csrhddlam
Copy link
Owner

csrhddlam commented May 4, 2021

Yes, it sounds correct.

Our current implementation supports global attention for now.

For local attention (i.e. span-size is smaller than the width), the relative position encoding matrix size depends on how one implements the local attention. One straight forward way is to use the matrix and the equation you mentioned, and get the attention matrix A (H,W,K), indicating the attention weights from all pixels (H, W) to K nearby pixels in a row (or in a column).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants