Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSD300_VGG16 architecture #16

Closed
puhan123 opened this issue Nov 7, 2020 · 7 comments
Closed

SSD300_VGG16 architecture #16

puhan123 opened this issue Nov 7, 2020 · 7 comments

Comments

@puhan123
Copy link

puhan123 commented Nov 7, 2020

Hi! Have you restructured your network of SSD300_Vgg16?

@Wuziyi616
Copy link
Contributor

Hi! Thank you for your interest in our work! Yes, we do modify the structure of VGG16 as the backbone of SSD300_Vgg16. The modifications are two-fold.

First, we add shortcut connections as suggested by BiReal-Net here. Note that this is for BiDet (SC), while we don't use such shortcut in BiDet.

Secondly, we add BatchNorm (BN) after every Conv layers as here. That's because many previous works point out that BN is very important for binary neural networks (BNNs).

We have also tried training a real-valued SSD300_VGG16 network with such modifications, and we discovered that adding BN didn't improve the accuracy, while adding shortcut even degraded the performance slightly. So we just report the original mAP of SSD in our paper. Also, I want to mention that these two operations (shortcut and BN) only add minor overhead to model FLOPs and parameter size. In contrast, they siginificantly improve the mAP of BiDet. For example, directly applying BiDet to vanilla SSD_VGG16 only gets an mAP of ~45% (if I remember correctly).

@puhan123
Copy link
Author

puhan123 commented Nov 8, 2020

Thank you very much! Your answer helped me a lot!

@puhan123
Copy link
Author

puhan123 commented Nov 9, 2020

Excuse me ,I have another question!

When you constructs the detector heads of ssd,did you also add BatchNorm(BN) after every Conv layers?

@Wuziyi616
Copy link
Contributor

Hi! This is a good question, and I did some ablation study in my experiments which was not shown in the paper. We didn't add BN after Conv layers in the detector heads (of SSD300_VGG16) as you can see from the code here. I have tested adding BN, and surprisingly, the mAP degraded ~0.5%. Also, it seems no benefit in training speed and stability. So we didn't use BN in detector head in the final version of BiDet.

I didn't delve deep into the its reason. But I conjecture that this is because the Conv layers in the detector not only extract features, they also need to localize the objects. Thus using Normalization methods like BN may harm the localization ability of Conv layers. As you can imagine, BN pushes feature maps to be like a normal distribution, which may make them less discriminative to distinguish different objects and localize them. However, I have to say this is all my conjecture and I am not sure whether this is true.

@puhan123
Copy link
Author

puhan123 commented Nov 9, 2020

Thank you very much! Your answer helped me a lot!

I notice that you take out the activation layer of some layers in bidet_ssd.py. Why would you do that?

And I also found you clip the maxpool layer in VGG16, and replace it with downsampling conv with stride==2. Can I understand this way?

@Wuziyi616
Copy link
Contributor

Wuziyi616 commented Nov 9, 2020

For the first question, do you mean here? If so, this is because we need to use intermediate feature maps to calculate I(X; F) as one of the loss term.

For the second question, yes you are right. I forgot this in my yeasterday's response. Indeed, I replace the MaxPool in VGG with stride2 Conv, because I discovered that MaxPool+BinaryConv performed very poorly in the detection task. Really sorry for my mistake, since this work was done one year ago and I haven't been working on it for a while.

BTW, there is another small modification of SSD's detector head's localization output branch. The original SSD predicts 4 values for a localization output, which is shift over x y and scale over x y. Here I use 8 values. Because BiDet adopts the Information Bottleneck (IB) principle, so the model output should be distributions rather than deterministic values. Therefore model shift and scale using Normal Distribution, and use 8 values to represent them, 4 for the mean and 4 for std.
(To be honest, the need to use distribution rather than deterministic value is derived from the IB theory. In my experiments, I didn't find much difference between them. I can get similiar mAP using either output form.)

@puhan123
Copy link
Author

puhan123 commented Nov 9, 2020

Thank you very much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants