Confused about the shape of relative position encoding #21

Jensen-Su · 2021-01-04T09:01:52Z

The following code generates a position embedding of shape (C, K, K), where C=self.group_planes*2, K=self.kernel_size:

axial-deeplab/lib/models/axialnet.py

Line 64 in fe1d052

    
           all_embeddings = torch.index_select(self.relative, 1, self.flatten_index).view(self.group_planes * 2, self.kernel_size, self.kernel_size)

It seems that each position in the (K, K) window owns a position encoding. But for axial-attention applied along w-axis, shouldn't the shape be (C, W), meaning that all rows share a same position encoding ?

The text was updated successfully, but these errors were encountered:

csrhddlam · 2021-01-06T17:11:12Z

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

Jensen-Su · 2021-01-11T12:19:45Z

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

Thanks for your helpful reply. I did mean global position encoding by (C, W).
I am also confused about the following line:

axial-deeplab/lib/models/axialnet.py

Line 44 in fe1d052

relative_index = key_index - query_index + kernel_size - 1

The confusion is that, since all the position encodings are initialized randomly, I expected that whatever orders we index the relative encoding should result in similar results. So maybe we can index it with a simpler way. But clearly you don't think so by using this relative_index. What do I miss?

phj128 · 2021-01-11T12:28:19Z

They are randomly initialized, but for different position they have different relative positional encoding while the same relative distance ones should share the weights.

mcahny · 2021-05-03T18:30:55Z

Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)

csrhddlam marked this as a duplicate and then as not a duplicate of #29 May 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about the shape of relative position encoding #21

Confused about the shape of relative position encoding #21

Jensen-Su commented Jan 4, 2021

csrhddlam commented Jan 6, 2021

Jensen-Su commented Jan 11, 2021 •

edited

Loading

phj128 commented Jan 11, 2021

mcahny commented May 3, 2021

Confused about the shape of relative position encoding #21

Confused about the shape of relative position encoding #21

Comments

Jensen-Su commented Jan 4, 2021

csrhddlam commented Jan 6, 2021

Jensen-Su commented Jan 11, 2021 • edited Loading

phj128 commented Jan 11, 2021

mcahny commented May 3, 2021

Jensen-Su commented Jan 11, 2021 •

edited

Loading