You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix r^q?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)
The text was updated successfully, but these errors were encountered:
Our current implementation supports global attention for now.
For local attention (i.e. span-size is smaller than the width), the relative position encoding matrix size depends on how one implements the local attention. One straight forward way is to use the matrix and the equation you mentioned, and get the attention matrix A (H,W,K), indicating the attention weights from all pixels (H, W) to K nearby pixels in a row (or in a column).
If the span-size
K
is smaller than the widthW
, then do we have the size of(C,W,K)
for the relative position encoding matrixr^q
?So that it's
einsum
ed with the query likeQ (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)
? (A: attention matrix)The text was updated successfully, but these errors were encountered: