-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSD300_VGG16 architecture #16
Comments
Hi! Thank you for your interest in our work! Yes, we do modify the structure of VGG16 as the backbone of SSD300_Vgg16. The modifications are two-fold. First, we add shortcut connections as suggested by BiReal-Net here. Note that this is for BiDet (SC), while we don't use such shortcut in BiDet. Secondly, we add BatchNorm (BN) after every Conv layers as here. That's because many previous works point out that BN is very important for binary neural networks (BNNs). We have also tried training a real-valued SSD300_VGG16 network with such modifications, and we discovered that adding BN didn't improve the accuracy, while adding shortcut even degraded the performance slightly. So we just report the original mAP of SSD in our paper. Also, I want to mention that these two operations (shortcut and BN) only add minor overhead to model FLOPs and parameter size. In contrast, they siginificantly improve the mAP of BiDet. For example, directly applying BiDet to vanilla SSD_VGG16 only gets an mAP of ~45% (if I remember correctly). |
Thank you very much! Your answer helped me a lot! |
Excuse me ,I have another question! When you constructs the detector heads of ssd,did you also add BatchNorm(BN) after every Conv layers? |
Hi! This is a good question, and I did some ablation study in my experiments which was not shown in the paper. We didn't add BN after Conv layers in the detector heads (of SSD300_VGG16) as you can see from the code here. I have tested adding BN, and surprisingly, the mAP degraded ~0.5%. Also, it seems no benefit in training speed and stability. So we didn't use BN in detector head in the final version of BiDet. I didn't delve deep into the its reason. But I conjecture that this is because the Conv layers in the detector not only extract features, they also need to localize the objects. Thus using Normalization methods like BN may harm the localization ability of Conv layers. As you can imagine, BN pushes feature maps to be like a normal distribution, which may make them less discriminative to distinguish different objects and localize them. However, I have to say this is all my conjecture and I am not sure whether this is true. |
Thank you very much! Your answer helped me a lot! I notice that you take out the activation layer of some layers in bidet_ssd.py. Why would you do that? And I also found you clip the maxpool layer in VGG16, and replace it with downsampling conv with stride==2. Can I understand this way? |
For the first question, do you mean here? If so, this is because we need to use intermediate feature maps to calculate I(X; F) as one of the loss term. For the second question, yes you are right. I forgot this in my yeasterday's response. Indeed, I replace the MaxPool in VGG with stride2 Conv, because I discovered that MaxPool+BinaryConv performed very poorly in the detection task. Really sorry for my mistake, since this work was done one year ago and I haven't been working on it for a while. BTW, there is another small modification of SSD's detector head's localization output branch. The original SSD predicts 4 values for a localization output, which is shift over x y and scale over x y. Here I use 8 values. Because BiDet adopts the Information Bottleneck (IB) principle, so the model output should be distributions rather than deterministic values. Therefore model shift and scale using Normal Distribution, and use 8 values to represent them, 4 for the mean and 4 for std. |
Thank you very much for your reply! |
Hi! Have you restructured your network of SSD300_Vgg16?
The text was updated successfully, but these errors were encountered: