The Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD). arXiv
CGD is a simple yet effective way to capture the correlations between each position and all positions across channels.and correspond to the global average pooling which maps features across spatial dimensions into a response vector.
The cascaded scheme utlizes both max pooling and ave pooling:
See attention_best.py for detail.
Add an attention layer (CGD) right after the first convolution layer in each block. Set the weight decay of CGD to 4e-5.
# __init__(self, in_channels, out_channels, bias=True, nonlinear=True):
self.attention = AttentionLayer(planes, planes, True, True)
out = self.conv1(x)
out = self.attention(out)
out = self.bn1(out)
out = self.relu(out)
residual = x
out = self.bn1(x)
out = self.relu(out)
out = self.conv1(out)
out = self.attention(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv2(out)
x = self.squeeze_activation(self.bn(self.attention(self.squeeze(x))))
if not self.equalInOut:
x = self.relu1(self.bn1(x))
else:
out = self.relu1(self.bn1(x))
out = self.relu2(self.bn2(self.attention(self.conv1(out if self.equalInOut else x))))
We visualize the feature map of res5b branch2a after ReLU. Second row is the original ResNet50 results. Third row illustrates the results with CGD. CGD deactivates neurons corresponding to backgrounds, which reduces the background noise and helps CNN focus more on objects.
@article{CGD,
author = {Xiangyu He and
Ke Cheng and
Qiang Chen and
Qinghao Hu and
Peisong Wang and
Jian Cheng},
title = {Compact Global Descriptor for Neural Networks},
journal = {arXiv},
volume = {abs/1907.09665},
year = {2019},
url = {http://arxiv.org/abs/1907.09665}
}