Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused about the shape of relative position encoding #21

Open
Jensen-Su opened this issue Jan 4, 2021 · 4 comments
Open

Confused about the shape of relative position encoding #21

Jensen-Su opened this issue Jan 4, 2021 · 4 comments

Comments

@Jensen-Su
Copy link

The following code generates a position embedding of shape (C, K, K), where C=self.group_planes*2, K=self.kernel_size:

all_embeddings = torch.index_select(self.relative, 1, self.flatten_index).view(self.group_planes * 2, self.kernel_size, self.kernel_size)

It seems that each position in the (K, K) window owns a position encoding. But for axial-attention applied along w-axis, shouldn't the shape be (C, W), meaning that all rows share a same position encoding ?

@csrhddlam
Copy link
Owner

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

@Jensen-Su
Copy link
Author

Jensen-Su commented Jan 11, 2021

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

Thanks for your helpful reply. I did mean global position encoding by (C, W).
I am also confused about the following line:

relative_index = key_index - query_index + kernel_size - 1

The confusion is that, since all the position encodings are initialized randomly, I expected that whatever orders we index the relative encoding should result in similar results. So maybe we can index it with a simpler way. But clearly you don't think so by using this relative_index. What do I miss?

@phj128
Copy link
Collaborator

phj128 commented Jan 11, 2021

They are randomly initialized, but for different position they have different relative positional encoding while the same relative distance ones should share the weights.

@mcahny
Copy link

mcahny commented May 3, 2021

Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)

@csrhddlam csrhddlam marked this as a duplicate and then as not a duplicate of #29 May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants