Shape of relative position encoding r^q, r^k, r^v #29

mcahny · 2021-05-04T06:59:29Z

If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix r^q?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)

The text was updated successfully, but these errors were encountered:

csrhddlam · 2021-05-04T21:40:30Z

Yes, it sounds correct.

Our current implementation supports global attention for now.

For local attention (i.e. span-size is smaller than the width), the relative position encoding matrix size depends on how one implements the local attention. One straight forward way is to use the matrix and the equation you mentioned, and get the attention matrix A (H,W,K), indicating the attention weights from all pixels (H, W) to K nearby pixels in a row (or in a column).

csrhddlam mentioned this issue May 4, 2021

Confused about the shape of relative position encoding #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape of relative position encoding r^q, r^k, r^v #29

Shape of relative position encoding r^q, r^k, r^v #29

mcahny commented May 4, 2021 •

edited

Loading

csrhddlam commented May 4, 2021 •

edited

Loading

Shape of relative position encoding r^q, r^k, r^v #29

Shape of relative position encoding r^q, r^k, r^v #29

Comments

mcahny commented May 4, 2021 • edited Loading

csrhddlam commented May 4, 2021 • edited Loading

mcahny commented May 4, 2021 •

edited

Loading

csrhddlam commented May 4, 2021 •

edited

Loading