params.json

{
  "name": "Study-report-convnet",
  "tagline": "",
  "body": "# study-report-convnet\r\n\r\n## Components in ConvNet\r\n\r\n### Why Convolution Layer\r\n - Motivations: Sparse connectivity, Parameter Sharing, Equivariance to translation\r\n - Stided convolution = Convolution + downsampling\r\n - Special type: Locally connected convolutions (unshared convolution)\r\n\r\n### non-linear Layer\r\n - Of course the most important, provides kernel trick\r\n - With conv+non-linear => exponential growth on linear regions\r\n\r\n### Why Pooling Layer\r\n - Max pooling: \r\n   - Care more about whether some feature is present than exactly where it is\r\n   - Invariant to small translations\r\n   - Enlarges the maximum value's receptive field\r\n - Average pooling:\r\n   - To reduce variance between hypothesys\r\n - Pooling over spatial vs. Pooling over features (may learn transformations)\r\n - Pooling is also useful for handling inputs of varying size\r\n\r\n### Why Normalization Layer\r\n - Re-arrange features in order to have better activations on non-linear layer\r\n\r\n\r\n## AlexNet\r\n### Innovations\r\n - 8-layer convnet\r\n    - Use ReLU nonlinearity\r\n    - Training on 2 GPUs\r\n    - Local Response Normalization (contrast normalization) => not used recently\r\n - Regularization\r\n    - Data augmentation\r\n    - Dropout\r\n\r\n## ZFNet\r\n### Innovations\r\n - Tried to visualize what AlexNet is actually learning => DeConv\r\n - Observing First layer problems\r\n    - Few features dominates => renormalize weights with RMS exceeding 0.1\r\n    - Aliasing and lack of mid-level features of AlexNet => Use a smaller filter size (11->7) and smaller stride (4->2)\r\n\r\n### Experiments\r\n - Feature invariance (to translation, scaling, rotation)\r\n - Occlusion Sensitivity (see if the machine is learning the object or its surroundings)\r\n - Correspondence Analysis (check whether network can learn a hierarchy structure)\r\n - Varying ImageNet model size and placing SVM on top => we must have a minimum depth\r\n\r\n## Overfeat\r\n### Innovations\r\n - More about object detection (sliding window)\r\n - Shows the meaning of 1x1 conv \r\n\r\n## VGG\r\n### Innovations\r\n - Investigate the effect of the convolutional network depth on its accuracy\r\n - Increase depth (16-19) with only 3x3 + 1x1 conv\r\n\r\n### Conclusions\r\n - Stacking 3x3 convs is better than single 7x7\r\n    - faster computation\r\n    - more non-linearity\r\n - Adding 1x1 conv in between may be useful => adding a non-linear layer (NIN has a better explanasion)\r\n - LRN is useless\r\n - Initialization is important (use Xavier init)\r\n - Add scaling to data augmentation\r\n\r\n## Network in Network\r\n### Concept \r\n - having too many filters for a single concept imposes extra burden on the next layer, which needs to consider all combinations of variations from the previous layer\r\n\r\n### Innovations\r\n - Represents MLP (multi-layer perceptron) Layers = larger conv + 1x1 conv\r\n - Global average pooling before fully connected rather than pure concatenate\r\n - MLP can implement dropouts on top\r\n\r\n## GoogLeNet\r\n### Concept\r\n - Improved utilization of the computing resources inside the network\r\n - Extend a network's width + depth instead of just depth\r\n - Based on Hebbian principle (sparsity + clustering): neurons that fires together, wires together\r\n    - Want to create a somewhat non-uniform sparse model for each layer\r\n    - Use concatenate of ConvNets => small dense matrices\r\n    - Wires together => Clustering => use 1x1 Conv\r\n    - Think convnets in another aspect: as clustering methods rather than feature extraction\r\n\r\n### Innovations\r\n - Inception module (1x1, 3x3, 5x5, 3x3 max)\r\n - Use 1x1 conv as a parameter reduction method (before 3x3, 5x5 conv; after 3x3 max as a clustering method)\r\n\r\n### Architecture\r\n - Use avg pooling at last\r\n - Tried to modify features produced in the middle of the network\r\n\r\n## ResNet\r\n### Innovations\r\n - Observed the problem of training error saturation (problems brought out by ReLU)\r\n - Use a Residual network \r\n    - Gradients flow back from top\r\n    - While network want an identity matrix, just set the non-linear functions to 0\r\n - Start using batch normalization\r\n\r\n### Architecture\r\n - Bottleneck architecture for speed up => reading squeezenet\r\n\r\n## Future\r\n - Try different non-linear function methods\r\n - Think about bottleneck architecture\r\n - Track about GoogLeNet's spare representation\r\n",
  "note": "Don't delete this file! It's used internally to help with page regeneration."
}