Paper

Title: Network in Network
Authors: Min Lin, Qiang Chen and Suicheng Yan
URL: https://arxiv.org/abs/1312.4400
Year: 2013
Other information: Implementation using TFLearn is available on GitHub.
Key: Lin2013

Goal:

Propose a deep neural network structure replacing the generalized linear model structure of the convolutional filter with a multilayer perceptron (MLP).

Motivation:

Convolutional filter in CNN is a generalized linear model for the underlying data patch.
- Implicitly makes assumption that latent concepts are linearly separable.
- Representations that achieve good abstraction are highly non-linear functions of the training data.

Architecture

Start with a conventional CNN and replace the linear convolution filter with a "micro neural network" hence the name Network in Network (NIN).
Local patch is mapped to the output feature vector with a MLP consisting of fully connected layers with non-linear activation functions (ReLU in this work).
MLP slides over the input the same way as the convolution filters.
Why MLP?
- MLP can be trained using backprop.
- MLP can be a deep model itself.
This structure can also be seen as replacing a n x n convolution with a stack of 1x1 convolutions.

Global average pooling
- Fully connected layers are prone to overfitting: replace them with a global average pooling layer.
- Take the average of each feature map and feed it directly into the softmax layer
- More native to the convolution structure: clarer correspondence between feature maps and categories.
- Feature maps can be interpreted as category confidence maps.
- No parameter to optimize.
- Can be seen as a structural regularizer.

Datasets:

Four benchmark datasets

	CIFAR-10	CIFAR-100	SVHN	MNIST
Image size	32x32	32x32	32x32	28x28
Training images	50K	50K	73257 + 531131(extra)	60K
Validation images	(x)		(y)
Testing images	10K	10K	26032	10K
Number of classes	10	100	10	10

(x) in the experiments with CIFAR-10 dataset the training images are split into 40K/10K training/validation sets.

(y) 400 samples per class from training set and 200 samples per class from the extra set are used for validation.

Experiments and Results

Three stacked mlpconv layers followed by max-pooling layers.
Dropout is applied on the outputs of all but the last mlpconv layers.
Training procedure follows Krizhevsky2012: weight initialization and learning rates (divided by 10 as accuracy stops improving).
Results for this implementation are shown as NIN+dropout in tables.
CIFAR-10/CIFAR-100:
- Pre-processing: global contrast normalization and ZCA whitening.
- For the CIFAR-10, two hyperparameters were tuned: local receptive field size and the weight decay. For the CIFAR-100, the same settings of CIFAR-10 are used.
- Translation and horizontal flipping augmentation.
CIFAR-10

Method	Test Error (%)
Stochastic pooling	15.13
CNN + Spearmint	14.98
Conv. maxout + dropout	11.68
NIN + dropout	10.41
CNN + Spearmint + data augmentation	9.50
Conv. maxout + dropout + data augmentation	9.38
DropConnect + 12 networks + data augmentation	9.32
NIN + dropout + data augmentation	8.81

Largest activations are observed in the feature map corresponding to the groud truth category of the input image (enforced by global average pooling).

CIFAR-100

Method	Test Error (%)
Learned pooling	43.71
Stochastic pooling	42.51
Conv. maxout + dropout	38.57
Tree based priors	36.85
NIN + dropout	35.68

SVHN - Street View House Numbers:
- Pre-processing: local constrast normalization
- 3 mlpconv layers followed by global average pooling

Method	Test Error (%)
Stochastic pooling	2.80
Rectifier + dropout	2.78
Rectifier + dropout + synthetic translation	2.68
Conv. maxout + dropout	2.47
Multi-digit number recognition	2.16
DropConnect	1.94
NIN + dropout	2.35

MNIST:
- Same structure as used for CIFAR-10 but number of feature maps generated from each mlpconv layer is reduced.
- Tested without data augmentation.

Method	Test Error (%)
2-layer CNN + 2-layer NN	0.53
Stochastic pooling	0.47
Conv. maxout + dropout	0.45
NIN + dropout	0.47

Caveats

NIN was not tested without dropout.
NIN was tested only on small images - no tests were performed on the ImageNet dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network_in_Network.md

Network_in_Network.md

Paper

Goal:

Motivation:

Architecture

Datasets:

Experiments and Results

Caveats

Files

Network_in_Network.md

Latest commit

History

Network_in_Network.md

File metadata and controls

Paper

Goal:

Motivation:

Architecture

Datasets:

Experiments and Results

Caveats