This is the bilinear pooling and compact bilinear pooling caffe implementation. The original implementation is in MatConvNet and for convenience, we port them to Caffe.
Sample layer prototxt:
Bilinear layer:
layer {
name: "bilinear_layer"
type: "Bilinear"
bottom: "in1"
bottom: "in2"
top: "out"
}
compact bilinear Tensor Sketch layer:
layer {
name: "compact_bilinear"
type: "CompactBilinear"
bottom: "in1"
bottom: "in2"
top: "out"
compact_bilinear_param {
num_output: 4096
sum_pool: false
}
}
We only implemented the compact bilinear Tensor Sketch version without learning the random weights, since it's the best in practice. For convenience, we also implement the signed-sqrt layer and the (sample-wise) l2 normalization layer, as:
layer {
name: "signed_sqrt_layer"
type: "SignedSqrt"
bottom: "in"
top: "out"
}
and
layer {
name: "l2_normalization_layer"
type: "L2Normalization"
bottom: "in"
top: "out"
}
the usual use cases are compact_bilinear + signed-sqrt + l2_normalization + classification.
For both bilinear and compact bilinear layer, two inputs could be the same blob, i.e. in1==in2. But we always require two inputs. The two input sizes must be compatible with each other. "in1" and "in2" should have shapes: N*C1*H*W and N*C2*H*W respectively. Only the number of channels could be different.
The bilinear layer always output a blob with a shape of N*(C1*C2)*1*1, i.e. bilinear features that is spatially sum pooled. The compact bilinear layer's output shape, on the other hand, depend on its compact_bilinear_param. In addition to the spatially sum pooled feature (output size N*num_output*1*1, we also allow the non-pooled feature (sum_pool: false, output size: N*num_output*H*W). This could be useful in the case where one needs some spatio resolution in the output, such as keypoint detection.
There is an example of using compact bilinear feature to do birds classification in folder: $CAFFE_ROOT/examples/compact_bilinear. Please check it out!
If you want to merge the compact bilinear layer into your own caffe version, make sure you have changed all those files:
cmake/Cuda.cmake
add ${CUDA_CUFFT_LIBRARIES} to Caffe_LINKER_LIBS
include/caffe/layers:
bilinear_layer.hpp
compact_bilinear_layer.hpp
l2_normalize_layer.hpp
signed_sqrt_layer.hpp
and their corresponding .cu and .cpp files in src/caffe/layers, and tests at src/caffe/test/.
include/caffe/util:
_kiss_fft_guts.h
kiss_fft.h
kiss_fftr.h
src/caffe/util
kiss_fft.cpp
kiss_fftr.cpp
Makefile
change the line "LIBRARIES := cudart cublas curand" to "LIBRARIES := cudart cublas curand cufft".
src/caffe/proto/caffe.proto
add "optional CompactBilinearParameter compact_bilinear_param=145;" in the "message LayerParameter".
add the following message:
message CompactBilinearParameter {
optional uint32 num_output = 1;
optional bool sum_pool = 2 [default = true];
}
If you find bilinear pooling or compact bilinear pooling useful, please consider citing:
@inproceedings{lin2015bilinear,
title={Bilinear CNN models for fine-grained visual recognition},
author={Lin, Tsung-Yu and RoyChowdhury, Aruni and Maji, Subhransu},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={1449--1457},
year={2015}
}
and
@inproceedings{gao2016compact,
title={Compact Bilinear Pooling},
author={Gao, Yang and Beijbom, Oscar and Zhang, Ning and Darrell, Trevor},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on},
year={2016}
}
The compact bilinear pooling part of code is licensed under BDD, although the original Caffe is licensed under BSD. Please refer to the LICENSE_BDD file in this folder for details.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.
Check out the project site for all the details like
- DIY Deep Learning for Vision with Caffe
- Tutorial Documentation
- BVLC reference models and the community model zoo
- Installation instructions
and step-by-step examples.
Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.
Happy brewing!
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}