Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about the model size #13

Closed
youngboy52 opened this issue Sep 10, 2020 · 5 comments
Closed

some questions about the model size #13

youngboy52 opened this issue Sep 10, 2020 · 5 comments

Comments

@youngboy52
Copy link

Hi, could you tell me the size of your ssd or faster rcnn model? I found that my own trained faster rcnn model takes 142.14MB space! It is still too large.

@Wuziyi616
Copy link
Contributor

Thank you for your attention to our work! Yes, the trained models in this code are large because we don't perform binarization in our code. PyTorch doesn't support computation with binary weight so far. If you want to get a smaller model from a model trained using this repo, I recommend you to refer to some BNN frameworks (e.g. daBNN).

Also, I need to point out that, we don't implement the bi-real re-training schedule in this repo. So in the Bi-Real Net paper, the authors first train the model with real-valued weight, then re-train the model with learning_rate=0 to let the BN absorb the magnitude of the weight. Thus you can't directly binarize the weight of the models trained using this repo. You can refer to bi-real for the re-training implementation.

@youngboy52
Copy link
Author

Thank you for your reply, I read the BiDet paper, and found that the experimental results reported in your paper include the BiDet, Bi-Real-Net and BiDet (SC). According to this repo, the trained faster rcnn model based on resent-18 can obtained mAP 57.42 in PASCAL VOC dataset, which is close to the result (Table 1) of Bi-Real-Net in your paper. So could you tell me some details about the implementations of BiDet and BiDet(SC)? are they also trained in real value? Looking forward to your reply!
image

@Wuziyi616
Copy link
Contributor

Wuziyi616 commented Sep 15, 2020

Hi! So basically BiDet means using Xnor-Net as the detector architecture (quantization method, network connection, etc.) plus the Sparse Object Prior and IB training loss. And BiDet (SC) means using Bi-Real-Net as the detector architecture plus our proposed losses (SC stands for shortcut, which is adopted in Bi-Real-Net). In this repo, we only implement BiDet (SC) because it achieves better mAP than BiDet.

So, if you see something like "reg loss: xxx, prior loss: xxx", then this is BiDet (SC) training. I think the reason you get worse mAP than reported in the paper is because of the training schedule, e.g. learning rate decay and batch size. You can refer to this issue for more details about training binary detectors.

Just to make sure one more thing, all the "Xnor-Net", "Bi-Real-Net", "BiDet", "BiDet (SC)" are trained using binary values.

@youngboy52
Copy link
Author

youngboy52 commented Sep 15, 2020

Hi! sorry to bother you again, as you said, these models are trained using binary values. In this repo, can I train BiDet (SC) using binary values?According to the code, this repo uses nn.Conv2d instead of your defined BinarizeConv2d in class BiDetResnet, however RPN_Conv uses the BinarizeConv2d. It confuses me that whether BiDet (SC) are trained using binary values in this repo. If I want to obtain a binarized network, do I need to transfer the trained BiDet (SC) model to binarized one using daBNN?

@Wuziyi616
Copy link
Contributor

Oh, I understand what you mean. It is a common practice in BNN that, we keep the first and last layer of the network full-precision (see Xnor-Net, Bi-Real-Net, etc.). That's why I use nn.Conv2d here. Besides, Bi-Real-Net also uses full-precision Conv in the downsample residual connection of ResNet, and that's why we also use nn.Conv2d here. But in other layers of the network, we use BinaryConv, for example here, you can see that we feed the BinBasicBlock which consists of two BinaryConv into the function. Also as you point out, we use BinaryConv in the RPN.

One more thing, I'm not familiar with daBNN. But I think if you want to get a binarized network, whose weight is -1 and +1 in most of the layers, you need to re-train the network by only updating the BN to absorb the magnitude caused by binarization. You can refer to the Bi-Real-Net paper, Section 4.1, "Training" part for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants