Skip to content

BatchNorm before activation vs BatchNorm after activation #21

Open
@AlexanderHustinx

Description

@AlexanderHustinx

Thanks for your implementation of the Octave Conv paper.

I have a remark/question about the Conv_BN_ACT module.
As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and alpha=0.25.
After some experimentation with using BatchNorm before or after the activation I found the following results:

Network description # epochs accuracy (%) training loss test loss
Conv_BN_ACT 15 78.46 0.7093 0.6362
30 82.20 0.4613 0.5456
Conv_ACT_BN 15 82.84 0.3917 0.5260
30 84.18 0.1614 0.6036

I observe that Conv_ACT_BN has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of Conv_BN_ACT. However, Conv_ACT_BN does have a much higher accuracy.

Have you looked at this before? Is this the reason why you choose to include Conv_BN_ACT and not Conv_ACT_BN?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions