BatchNorm before activation vs BatchNorm after activation

Thanks for your implementation of the Octave Conv paper.

I have a remark/question about the `Conv_BN_ACT` module. 
As BN after ACT sometimes makes more sense, following the PyTorch example I made a small OctaveCNN (each using 6 convs total) for the CIFAR10 dataset. Using PReLU activations, CrossEntropy loss, AmsGrad optimizer and `alpha=0.25`.
After some experimentation with using BatchNorm before or after the activation I found the following results:

| Network description | # epochs | accuracy (%) | training loss | test loss | 
| --- | --- | --- | --- | --- |
| `Conv_BN_ACT` | 15 | 78.46 | 0.7093 | 0.6362 |
| | 30 | 82.20 | 0.4613 | 0.5456 |
| `Conv_ACT_BN` | 15 | 82.84 | 0.3917 | 0.5260 |
| | 30 | 84.18 | 0.1614 | 0.6036 |

I observe that `Conv_ACT_BN` has a tendency to overfit more as its training loss is noticeably lower than testing loss when compared to those of `Conv_BN_ACT`. However, `Conv_ACT_BN` does have a much higher accuracy.

Have you looked at this before? Is this the reason why you choose to include `Conv_BN_ACT` and not `Conv_ACT_BN`?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BatchNorm before activation vs BatchNorm after activation #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Network description	# epochs	accuracy (%)	training loss	test loss
`Conv_BN_ACT`	15	78.46	0.7093	0.6362
	30	82.20	0.4613	0.5456
`Conv_ACT_BN`	15	82.84	0.3917	0.5260
	30	84.18	0.1614	0.6036

BatchNorm before activation vs BatchNorm after activation #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions