Questions about local enhancing component of MTA #14

unclebuff · 2022-07-06T13:52:10Z

作者您好，感谢您的工作。
在文章中我看到LE(.) 表示 local enhancing component of MTA for value V by a depth-wise convolution. 对应代码中为
v1 = v1 + self.local_conv1(v1.transpose(1, 2).reshape(B, -1, C//2). transpose(1, 2).view(B,C//2, H//self.sr_ratio, W//self.sr_ratio)).\ view(B, C//2, -1).view(B, self.num_heads//2, C // self.num_heads, -1).transpose(-1, -2)

关于这部分我有两个疑问：

这部分depth-wise convolution的设计是出于怎样的考虑？对于这部分文章中并没有具体描述和对应的消融实验。
SSA.py 中line 142 and 143 的代码
x = (attn @ v).transpose(1, 2).reshape(B, N, C) + self.local_conv(v.transpose(1, 2).reshape(B, N, C). transpose(1, 2).view(B,C, H, W)).view(B, C, N).transpose(1, 2)
表示在最后一个stage中enhanced V 直接与注意力计算后的x相加，而不是像前几个stage一样与V相加后再计算注意力，这是出于怎样的考虑？论文里并没有具体介绍。

希望您可以解答这两个问题，感谢

The text was updated successfully, but these errors were encountered:

go-ahead-maker · 2022-07-27T09:29:12Z

你可以去看看同样是CVPR22的CSwin那篇工作，里面同样介绍了这个设计（LePE模块），那篇文章去年这个时候就放到arxiv上了好像。
这篇里面用的这个设计我猜是follow了cswin的lepe？

OliverRensu · 2022-08-15T15:18:24Z

1.这个就是加强一下局部信息提取，同时有提供一些position的信息。
2.最后一个stage的这个影响很小，放在哪都不影响最终的performance。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about local enhancing component of MTA #14

Questions about local enhancing component of MTA #14

unclebuff commented Jul 6, 2022

go-ahead-maker commented Jul 27, 2022

OliverRensu commented Aug 15, 2022

Questions about local enhancing component of MTA #14

Questions about local enhancing component of MTA #14

Comments

unclebuff commented Jul 6, 2022

go-ahead-maker commented Jul 27, 2022

OliverRensu commented Aug 15, 2022