You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I'm so interested in this work and tried to reproduce it using python and pytorch.
When I run my code on CIFAR100 I found the coupled ternary structure performed poorly than the corresponding binary one. I use the resnet18 with resnet_sc_coupled architecture(64-90-180-360), the activition function is transformed from yours, and weight binarization function is adopted the same as BiReal-Net that use a sign function in forward and clamp function in backward.
The accuracy of coupled ternary network and its corresponding binary one (64-128-256-512) are 66.24 vs 66.48,which is not consistent with the phenomenon in paper that "using ternary activation and cutting the model size in half improved the performance of the network".
I am confused with this result , and want to know if a ternary activation with half model size can perform obviously better than the normal binary one, or does the finetune process based on the decoupled one is very important ?
The text was updated successfully, but these errors were encountered:
First of all, it's a good idea to test with the same model architecture on the same dataset to reproduce the numbers on our paper to see if there are any miss-implemented modules. Also, according to the original ResNet paper (https://arxiv.org/abs/1512.03385), ResNet-18 is for ImageNet sized datasets and they used ResNet-20,32,44,56, etc. for CIFAR dataset. If you need to use ResNet-18 on CIFAR dataset, make sure to keep the size of the spatial domain of the feature map in early layers.
Also, we cannot say that coupled ternary model is always better than baseline binary model because it is so difficult to measure the performance of a model. Note that the coupled ternary model has half number of parameters compared to the baseline binary model. And YES, the fine-tuning process is important to improve the model performance. In our experiment (section 6.1), the coupled ternary model achieved a 0.62% improvement, and fine-tuning it resulted in an additional 0.75% improvement.
I hope my answer helps your understanding. Thanks!
Hello! I'm so interested in this work and tried to reproduce it using python and pytorch.
When I run my code on CIFAR100 I found the coupled ternary structure performed poorly than the corresponding binary one. I use the resnet18 with resnet_sc_coupled architecture(64-90-180-360), the activition function is transformed from yours, and weight binarization function is adopted the same as BiReal-Net that use a sign function in forward and clamp function in backward.
The accuracy of coupled ternary network and its corresponding binary one (64-128-256-512) are 66.24 vs 66.48,which is not consistent with the phenomenon in paper that "using ternary activation and cutting the model size in half improved the performance of the network".
I am confused with this result , and want to know if a ternary activation with half model size can perform obviously better than the normal binary one, or does the finetune process based on the decoupled one is very important ?
The text was updated successfully, but these errors were encountered: