Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about local enhancing component of MTA #14

Open
unclebuff opened this issue Jul 6, 2022 · 2 comments
Open

Questions about local enhancing component of MTA #14

unclebuff opened this issue Jul 6, 2022 · 2 comments

Comments

@unclebuff
Copy link

作者您好,感谢您的工作。
在文章中我看到LE(.) 表示 local enhancing component of MTA for value V by a depth-wise convolution. 对应代码中为
v1 = v1 + self.local_conv1(v1.transpose(1, 2).reshape(B, -1, C//2). transpose(1, 2).view(B,C//2, H//self.sr_ratio, W//self.sr_ratio)).\ view(B, C//2, -1).view(B, self.num_heads//2, C // self.num_heads, -1).transpose(-1, -2)

关于这部分我有两个疑问:

  1. 这部分depth-wise convolution的设计是出于怎样的考虑?对于这部分文章中并没有具体描述和对应的消融实验。
  2. SSA.py 中line 142 and 143 的代码
    x = (attn @ v).transpose(1, 2).reshape(B, N, C) + self.local_conv(v.transpose(1, 2).reshape(B, N, C). transpose(1, 2).view(B,C, H, W)).view(B, C, N).transpose(1, 2)
    表示在最后一个stage中enhanced V 直接与注意力计算后的x相加,而不是像前几个stage一样与V相加后再计算注意力,这是出于怎样的考虑?论文里并没有具体介绍。

希望您可以解答这两个问题,感谢

@go-ahead-maker
Copy link

你可以去看看同样是CVPR22的CSwin那篇工作,里面同样介绍了这个设计(LePE模块),那篇文章去年这个时候就放到arxiv上了好像。
这篇里面用的这个设计我猜是follow了cswin的lepe?

@OliverRensu
Copy link
Owner

1.这个就是加强一下局部信息提取,同时有提供一些position的信息。
2.最后一个stage的这个影响很小,放在哪都不影响最终的performance。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants