Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if a ternary model with half model size can perform obviously better than the normal binary one #1

Open
liyue2ppy opened this issue Mar 30, 2020 · 1 comment

Comments

@liyue2ppy
Copy link

Hello! I'm so interested in this work and tried to reproduce it using python and pytorch.

When I run my code on CIFAR100 I found the coupled ternary structure performed poorly than the corresponding binary one. I use the resnet18 with resnet_sc_coupled architecture(64-90-180-360), the activition function is transformed from yours, and weight binarization function is adopted the same as BiReal-Net that use a sign function in forward and clamp function in backward.

The accuracy of coupled ternary network and its corresponding binary one (64-128-256-512) are 66.24 vs 66.48,which is not consistent with the phenomenon in paper that "using ternary activation and cutting the model size in half improved the performance of the network".

I am confused with this result , and want to know if a ternary activation with half model size can perform obviously better than the normal binary one, or does the finetune process based on the decoupled one is very important ?

@Hyungjun-K1m
Copy link
Owner

Hi,

Thank you for your interest in our work!

First of all, it's a good idea to test with the same model architecture on the same dataset to reproduce the numbers on our paper to see if there are any miss-implemented modules. Also, according to the original ResNet paper (https://arxiv.org/abs/1512.03385), ResNet-18 is for ImageNet sized datasets and they used ResNet-20,32,44,56, etc. for CIFAR dataset. If you need to use ResNet-18 on CIFAR dataset, make sure to keep the size of the spatial domain of the feature map in early layers.

Also, we cannot say that coupled ternary model is always better than baseline binary model because it is so difficult to measure the performance of a model. Note that the coupled ternary model has half number of parameters compared to the baseline binary model. And YES, the fine-tuning process is important to improve the model performance. In our experiment (section 6.1), the coupled ternary model achieved a 0.62% improvement, and fine-tuning it resulted in an additional 0.75% improvement.

I hope my answer helps your understanding. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants