This is an experimental Tensor Flow implementation of Faster RCNN (TFFRCNN), mainly based on the work of smallcorgi and rbgirshick. I have re-organized the libraries under lib
path, making each of python modules independent to each other, so you can understand, re-write the code easily.
For details about R-CNN please refer to the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun.
- Resnet networks support
- KITTI object detection dataset support
- Position Sensitive ROI Pooling (psroi_pooling), not testing yet
- Hard Example Mining
- Data Augment
- PVANet
- Tensorflow 1.0
- Multi-layer Architecture (HyperNet)
- more hacks...
Requirements for Tensorflow (see: Tensorflow)
Python packages you might not have:
(recommend to install: Anaconda)
- For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)
- Clone the Faster R-CNN repository
git clone
- setup
cd ./lib python build cd ./lib/build cp *so to nms and utils folder
- Build the Cython modules
cd TFFRCNN/lib make # compile cython and roi_pooling_op, you may need to modify for your platform
After successfully completing basic installation, you'll be ready to run the demo.
To run the demo
python ./faster_rcnn/ --model model_path
The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. , where model_path`can be download below
Download the training, validation, test data and VOCdevkit
wget wget wget
Extract all of these tars into one directory named
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure
$VOCdevkit/ # development kit $VOCdevkit/VOCcode/ # VOC utility code $VOCdevkit/VOC2007 # image sets, annotations, etc. # ... and several other directories ...
Create symlinks for the PASCAL VOC dataset
cd $TFFRCNN/data ln -s $VOCdevkit VOCdevkit2007
Download pre-trained model VGG16 and put it in the path
Run training scripts
cd $TFFRCNN python ./faster_rcnn/ --gpu 0 --weights ./data/pretrain_model/VGG_imagenet.npy --imdb voc_2007_trainval --iters 70000 --cfg ./experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir
Run a profiling
cd $TFFRCNN # install a visualization tool sudo apt-get install graphviz ./experiments/profiling/ # generate an image ./experiments/profiling/profile.png
Download the KITTI detection dataset
Extract all of these tar into
and the directory structure looks like this:KITTI |-- training |-- image_2 |-- [000000-007480].png |-- label_2 |-- [000000-007480].txt |-- testing |-- image_2 |-- [000000-007517].png |-- label_2 |-- [000000-007517].txt
Convert KITTI into Pascal VOC format
cd $TFFRCNN ./experiments/scripts/ \ --kitti $TFFRCNN/data/KITTI --out $TFFRCNN/data/KITTIVOC
The output directory looks like this:
KITTIVOC |-- Annotations |-- [000000-007480].xml |-- ImageSets |-- Main |-- [train|val|trainval].txt |-- JPEGImages |-- [000000-007480].jpg
Training on
is just like on Pascal VOC 2007python ./faster_rcnn/ \ --gpu 0 \ --weights ./data/pretrain_model/VGG_imagenet.npy \ --imdb kittivoc_train \ --iters 160000 \ --cfg ./experiments/cfgs/faster_rcnn_kitti.yml \ --network VGGnet_train