Skip to content

Files

Latest commit

author
SaihuiHou
Jul 20, 2017
61dd125 · Jul 20, 2017

History

History

caffe-hybridnet

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017
Jul 20, 2017

This is the bilinear pooling and compact bilinear pooling caffe implementation. The original implementation is in MatConvNet and for convenience, we port them to Caffe.

Sample layer prototxt:
Bilinear layer:

layer {
  name: "bilinear_layer"
  type: "Bilinear"
  bottom: "in1"
  bottom: "in2"
  top: "out"
}

compact bilinear Tensor Sketch layer:

layer {
  name: "compact_bilinear"
  type: "CompactBilinear"
  bottom: "in1"
  bottom: "in2"
  top: "out"
  compact_bilinear_param {
    num_output: 4096
    sum_pool: false
  }
}

We only implemented the compact bilinear Tensor Sketch version without learning the random weights, since it's the best in practice. For convenience, we also implement the signed-sqrt layer and the (sample-wise) l2 normalization layer, as:

layer {
  name: "signed_sqrt_layer"
  type: "SignedSqrt"
  bottom: "in"
  top: "out"
}

and

layer {
  name: "l2_normalization_layer"
  type: "L2Normalization"
  bottom: "in"
  top: "out"
}

the usual use cases are compact_bilinear + signed-sqrt + l2_normalization + classification.

For both bilinear and compact bilinear layer, two inputs could be the same blob, i.e. in1==in2. But we always require two inputs. The two input sizes must be compatible with each other. "in1" and "in2" should have shapes: N*C1*H*W and N*C2*H*W respectively. Only the number of channels could be different.

The bilinear layer always output a blob with a shape of N*(C1*C2)*1*1, i.e. bilinear features that is spatially sum pooled. The compact bilinear layer's output shape, on the other hand, depend on its compact_bilinear_param. In addition to the spatially sum pooled feature (output size N*num_output*1*1, we also allow the non-pooled feature (sum_pool: false, output size: N*num_output*H*W). This could be useful in the case where one needs some spatio resolution in the output, such as keypoint detection.

There is an example of using compact bilinear feature to do birds classification in folder: $CAFFE_ROOT/examples/compact_bilinear. Please check it out!

If you want to merge the compact bilinear layer into your own caffe version, make sure you have changed all those files:

cmake/Cuda.cmake
  add ${CUDA_CUFFT_LIBRARIES} to Caffe_LINKER_LIBS

include/caffe/layers:
  bilinear_layer.hpp
  compact_bilinear_layer.hpp
  l2_normalize_layer.hpp
  signed_sqrt_layer.hpp
and their corresponding .cu and .cpp files in src/caffe/layers, and tests at src/caffe/test/.

include/caffe/util:
  _kiss_fft_guts.h
  kiss_fft.h
  kiss_fftr.h
src/caffe/util
  kiss_fft.cpp
  kiss_fftr.cpp

Makefile
  change the line "LIBRARIES := cudart cublas curand" to "LIBRARIES := cudart cublas curand cufft".

src/caffe/proto/caffe.proto
  add "optional CompactBilinearParameter compact_bilinear_param=145;" in the "message LayerParameter". 

  add the following message:
  message CompactBilinearParameter {
   optional uint32 num_output = 1;
   optional bool sum_pool = 2 [default = true]; 
 }

If you find bilinear pooling or compact bilinear pooling useful, please consider citing:

@inproceedings{lin2015bilinear,
  title={Bilinear CNN models for fine-grained visual recognition},
  author={Lin, Tsung-Yu and RoyChowdhury, Aruni and Maji, Subhransu},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={1449--1457},
  year={2015}
}

and

@inproceedings{gao2016compact,
  title={Compact Bilinear Pooling},
  author={Gao, Yang and Beijbom, Oscar and Zhang, Ning and Darrell, Trevor},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on},
  year={2016}
}

The compact bilinear pooling part of code is licensed under BDD, although the original Caffe is licensed under BSD. Please refer to the LICENSE_BDD file in this folder for details.

Original Caffe Readme

Caffe

Build Status License

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}