Description
Thanks for your implementation of the Octave Conv paper.
I have a remark/question about the Conv_BN_ACT
module.
As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and alpha=0.25
.
After some experimentation with using BatchNorm before or after the activation I found the following results:
Network description | # epochs | accuracy (%) | training loss | test loss |
---|---|---|---|---|
Conv_BN_ACT |
15 | 78.46 | 0.7093 | 0.6362 |
30 | 82.20 | 0.4613 | 0.5456 | |
Conv_ACT_BN |
15 | 82.84 | 0.3917 | 0.5260 |
30 | 84.18 | 0.1614 | 0.6036 |
I observe that Conv_ACT_BN
has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of Conv_BN_ACT
. However, Conv_ACT_BN
does have a much higher accuracy.
Have you looked at this before? Is this the reason why you choose to include Conv_BN_ACT
and not Conv_ACT_BN
?