Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BatchNorm before activation vs BatchNorm after activation #21

Open
AlexanderHustinx opened this issue Sep 11, 2020 · 0 comments
Open

BatchNorm before activation vs BatchNorm after activation #21

AlexanderHustinx opened this issue Sep 11, 2020 · 0 comments

Comments

@AlexanderHustinx
Copy link

Thanks for your implementation of the Octave Conv paper.

I have a remark/question about the Conv_BN_ACT module.
As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and alpha=0.25.
After some experimentation with using BatchNorm before or after the activation I found the following results:

Network description # epochs accuracy (%) training loss test loss
Conv_BN_ACT 15 78.46 0.7093 0.6362
30 82.20 0.4613 0.5456
Conv_ACT_BN 15 82.84 0.3917 0.5260
30 84.18 0.1614 0.6036

I observe that Conv_ACT_BN has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of Conv_BN_ACT. However, Conv_ACT_BN does have a much higher accuracy.

Have you looked at this before? Is this the reason why you choose to include Conv_BN_ACT and not Conv_ACT_BN?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant