import from upstream

ijkguo · Jul 13, 2018 · e005b38 · e005b38
1 parent a02850b
commit e005b38
Show file tree

Hide file tree

Showing 23 changed files with 82 additions and 1,806 deletions.
diff --git a/README.md b/README.md
@@ -2,72 +2,46 @@
 
 ![example detections](https://cloud.githubusercontent.com/assets/13162287/22101032/92085dc0-de6c-11e6-9228-67e72606ddbc.png)
 
-Region Proposal Network solves object detection as a regression problem 
-from the objectness perspective. Bounding boxes are predicted by applying 
-learned bounding box deltas to base boxes, namely anchor boxes across 
-different positions in feature maps. Training process directly learns a 
-mapping from raw image intensities to bounding box transformation targets.
+### Set up environment
+* Require latest MXNet. Set environment variable by `export MXNET_CUDNN_AUTOTUNE_DEFAULT=0`.
+* Install Python package `mxnet` (cpu inference only) or `mxnet-cu90` (gpu training), `cython` then `opencv-python matplotlib pycocotools tqdm`.
 
-Fast R-CNN treats general object detection as a classification problem and
-bounding box prediction as a regression problem. Classifying cropped region
-feature maps and predicting bounding box displacements together yields
-detection results. Cropping feature maps instead of image input accelerates
-computation utilizing shared convolution maps. Bounding box displacements
-are simultaneously learned in the training process.
+### Out-of-box inference models
+Download any of the following models to the current directory and run `python3 demo.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --image $YOUR_IMAGE$` to get single image inference.
+For example `python3 demo.py --dataset voc --network vgg16 --params vgg16_voc0712.params --image myimage.jpg`, add `--gpu 0` to use GPU optionally.
+Different network has different configuration. Different dataset has different object class names. You must pass them explicitly as command line arguments.
 
-Faster R-CNN utilize an alternate optimization training process between RPN 
-and Fast R-CNN. Fast R-CNN weights are used to initiate RPN for training.
-The approximate joint training scheme does not backpropagate rcnn training
-error to rpn training.
+| Network | Dataset | Imageset | Reference | Result | Link  |
+| :------ | :------------ | :----------- | :-------: | :----: | :---: |
+| vgg16 | voc | 07/07 | 69.9 | 70.23 | [Dropbox](https://www.dropbox.com/s/gfxnf1qzzc0lzw2/vgg_voc07-0010.params?dl=0) |
+| vgg16 | voc | 07++12/07 | 73.2 | 75.97 | [Dropbox](https://www.dropbox.com/s/rvktx65s48cuyb9/vgg_voc0712-0010.params?dl=0) |
+| resnet101 | voc | 07++12/07 | 76.4 | 79.35 | [Dropbox](https://www.dropbox.com/s/ge2wl0tn47xezdf/resnet_voc0712-0010.params?dl=0) |
+| vgg16 | coco | train2017/val2017 | 21.2 | 22.8 | [Dropbox](https://www.dropbox.com/s/e0ivvrc4pku3vj7/vgg_coco-0010.params?dl=0) |
+| resnet101 | coco | train2017/val2017 | 27.2 | 26.1 | [Dropbox](https://www.dropbox.com/s/bfuy2uo1q1nwqjr/resnet_coco-0010.params?dl=0) |
 
-## Experiments
-
-| Indicator | py-faster-rcnn (caffe resp.) | mx-rcnn (this reproduction) |
-| :-------- | :--------------------------- | :-------------------------- |
-| Training speed [1] | 2.5 img/s training, 5 img/s testing | 3.8 img/s in training, 12.5 img/s testing |
-| Valset performance [2] | mAP 73.2               | mAP 75.97                   |
-| Memory usage [3]  | 11G for Fast R-CNN     | 4.6G for Fast R-CNN         |
-| Parallelization  [4] | None              | 3.8 img/s to 6 img/s for 2 GPUs |
-| Extensibility [5] | Old framework and base networks | ResNet           |
-
-[1] On Ubuntu 14.04.5 with device Titan X, cuDNN enabled.
-    The experiment is VGG-16 end-to-end training.  
-[2] VGG network. Trained end-to-end on VOC07trainval+12trainval, tested on VOC07 test.  
-[3] VGG network. Fast R-CNN is the most memory expensive process.  
-[4] VGG network (parallelization limited by bandwidth).
-    ResNet-101 speeds up from 2 img/s to 3.5 img/s.  
-[5] py-faster-rcnn does not support ResNet or recent caffe version.
-
-| Method | Network | Training Data | Testing Data | Reference | Result | Link  |
-| :----- | :------ | :------------ | :----------- | :-------: | :----: | :---: |
-| Faster R-CNN end-to-end | VGG16 | VOC07 | VOC07test | 69.9 | 70.23 | [Dropbox](https://www.dropbox.com/s/gfxnf1qzzc0lzw2/vgg_voc07-0010.params?dl=0) |
-| Faster R-CNN end-to-end | VGG16 | VOC07+12 | VOC07test | 73.2 | 75.97 | [Dropbox](https://www.dropbox.com/s/rvktx65s48cuyb9/vgg_voc0712-0010.params?dl=0) |
-| Faster R-CNN end-to-end | ResNet-101 | VOC07+12 | VOC07test | 76.4 | 79.35 | [Dropbox](https://www.dropbox.com/s/ge2wl0tn47xezdf/resnet_voc0712-0010.params?dl=0) |
-| Faster R-CNN end-to-end | VGG16 | COCO train | COCO val | 21.2 | 22.8 | [Dropbox](https://www.dropbox.com/s/e0ivvrc4pku3vj7/vgg_coco-0010.params?dl=0) |
-| Faster R-CNN end-to-end | ResNet-101 | COCO train | COCO val | 27.2 | 26.1 | [Dropbox](https://www.dropbox.com/s/bfuy2uo1q1nwqjr/resnet_coco-0010.params?dl=0) |
-
-The above experiments were conducted on [this version of repository](https://github.com/precedenceguo/mx-rcnn/tree/6a1ab0eec5035a10a1efb5fc8c9d6c54e101b4d0)
-with [a modified MXNet based on 0.9.1 nnvm pre-release](https://github.com/precedenceguo/mxnet/tree/simple).
-
-## Set up environment
-* Install Python package `mxnet` or `mxnet-cu90`, `cython` and `opencv-python matplotlib pycocotools tqdm`.
-
-## Download data and label
-Follow `py-faster-rcnn` for data preparation instructions.
+### Download data and label
+Make a directory `data` and follow `py-faster-rcnn` for data preparation instructions.
 * [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) should be in `data/VOCdevkit` containing `VOC2007`, `VOC2012` and `annotations`.
 * [MSCOCO](http://mscoco.org/dataset/) should be in `data/coco` containing `train2017`, `val2017` and `annotations/instances_train2017.json`, `annotations/instances_val2017.json`.
 
-## Download pretrained ImageNet models
+### Download pretrained ImageNet models
 * [VGG16](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) should be at `model/vgg16-0000.params` from [MXNet model zoo](http://data.dmlc.ml/models/imagenet/vgg/).
 * [ResNet](https://github.com/tornadomeet/ResNet) should be at `model/resnet-101-0000.params` from [MXNet model zoo](http://data.dmlc.ml/models/imagenet/resnet/).
 
+### Training and evaluation
+Use `python3 train.py --dataset $Dataset$ --network $Network$ --pretrained $IMAGENET_MODEL_FILE$ --gpus $GPUS$` to train,
+for example, `python3 train.py --dataset voc --network vgg16 --pretrained model/vgg16-0000.params --gpus 0,1`.
+Use `python3 test.py --dataset $Dataset$ --network $Network$ --params $MODEL_FILE$ --gpu $GPU$` to evaluate,
+for example, `python3 test.py --dataset voc --network vgg16 --params model/vgg16-0010.params --gpu 0`.
+
 ### History
 * May 25, 2016: We released Fast R-CNN implementation.
 * July 6, 2016: We released Faster R-CNN implementation.
 * July 23, 2016: We updated to MXNet module solver.
 * Oct 10, 2016: tornadomeet released approximate end-to-end training.
 * Oct 30, 2016: We updated to MXNet module inference.
 * Jan 19, 2017: We accelerated our pipeline and supported ResNet training.
+* Jun 22, 2018: We simplified code. 
 
 ### Disclaimer
 This repository used code from [MXNet](https://github.com/dmlc/mxnet),

diff --git a/sym_demo.py → demo.py b/sym_demo.py → demo.py
@@ -3,11 +3,12 @@
 import pprint
 
 import mxnet as mx
+from mxnet.module import Module
 
 from symdata.bbox import im_detect
 from symdata.loader import load_test, generate_batch
 from symdata.vis import vis_detection
-from symnet.model import get_net
+from symnet.model import load_param, check_shape
 
 
 def demo_net(sym, class_names, args):
@@ -27,14 +28,29 @@ def demo_net(sym, class_names, args):
     # generate data batch
     data_batch = generate_batch(im_tensor, im_info)
 
-    # assemble executor
-    predictor = get_net(sym, args.params, ctx, short=args.img_short_side, max_size=args.img_long_side)
+    # load params
+    arg_params, aux_params = load_param(args.params, ctx=ctx)
+
+    # produce shape max possible
+    data_names = ['data', 'im_info']
+    label_names = None
+    data_shapes = [('data', (1, 3, args.img_long_side, args.img_long_side)), ('im_info', (1, 3))]
+    label_shapes = None
+
+    # check shapes
+    check_shape(sym, data_shapes, arg_params, aux_params)
+
+    # create and bind module
+    mod = Module(sym, data_names, label_names, context=ctx)
+    mod.bind(data_shapes, label_shapes, for_training=False)
+    mod.init_params(arg_params=arg_params, aux_params=aux_params)
 
     # forward
-    output = predictor.predict(data_batch)
-    rois = output['rois_output'][:, 1:]
-    scores = output['cls_prob_reshape_output'][0]
-    bbox_deltas = output['bbox_pred_reshape_output'][0]
+    mod.forward(data_batch)
+    rois, scores, bbox_deltas = mod.get_outputs()
+    rois = rois[:, 1:]
+    scores = scores[0]
+    bbox_deltas = bbox_deltas[0]
     im_info = im_info[0]
 
     # decode detection
@@ -55,7 +71,7 @@ def demo_net(sym, class_names, args):
 def parse_args():
     parser = argparse.ArgumentParser(description='Demonstrate a Faster R-CNN network',
                                      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
-    parser.add_argument('--network', type=str, default='resnet50', help='base network')
+    parser.add_argument('--network', type=str, default='vgg16', help='base network')
     parser.add_argument('--params', type=str, default='', help='path to trained model')
     parser.add_argument('--dataset', type=str, default='voc', help='training dataset')
     parser.add_argument('--image', type=str, default='', help='path to test image')
@@ -99,7 +115,6 @@ def get_voc_names(args):
 
 def get_coco_names(args):
     from symimdb.coco import coco
-    args.rpn_anchor_scales = (2, 4, 8, 16, 32)
     args.rcnn_num_classes = len(coco.classes)
     return coco.classes
 

diff --git a/gluon_demo.py b/gluon_demo.py