- 8 NVIDIA V100 GPUs
- PyTorch version: 1.0.0a0+dd2c487
- CUDA 9.2
- CUDNN 7.1
- NCCL 2.2.13-1
All the baselines were trained using the exact same experimental setup as in Detectron. We initialize the detection models with ImageNet weights from Caffe2, the same as used by Detectron.
The pre-trained models are available in the link in the model id.
backbone | type | lr sched | im / gpu | train mem(GB) | train time (s/iter) | total train time(hr) | inference time(s/im) | box AP | mask AP | model id |
---|---|---|---|---|---|---|---|---|---|---|
R-50-C4 | Fast | 1x | 1 | 5.8 | 0.4036 | 20.2 | 0.17130 | 34.8 | - | 6358800 |
R-50-FPN | Fast | 1x | 2 | 4.4 | 0.3530 | 8.8 | 0.12580 | 36.8 | - | 6358793 |
R-101-FPN | Fast | 1x | 2 | 7.1 | 0.4591 | 11.5 | 0.143149 | 39.1 | - | 6358804 |
X-101-32x8d-FPN | Fast | 1x | 1 | 7.6 | 0.7007 | 35.0 | 0.209965 | 41.2 | - | 6358717 |
R-50-C4 | Mask | 1x | 1 | 5.8 | 0.4520 | 22.6 | 0.17796 + 0.028 | 35.6 | 31.5 | 6358801 |
R-50-FPN | Mask | 1x | 2 | 5.2 | 0.4536 | 11.3 | 0.12966 + 0.034 | 37.8 | 34.2 | 6358792 |
R-101-FPN | Mask | 1x | 2 | 7.9 | 0.5665 | 14.2 | 0.15384 + 0.034 | 40.1 | 36.1 | 6358805 |
X-101-32x8d-FPN | Mask | 1x | 1 | 7.8 | 0.7562 | 37.8 | 0.21739 + 0.034 | 42.2 | 37.8 | 6358718 |
For person keypoint detection:
backbone | type | lr sched | im / gpu | train mem(GB) | train time (s/iter) | total train time(hr) | inference time(s/im) | box AP | keypoint AP | model id |
---|---|---|---|---|---|---|---|---|---|---|
R-50-FPN | Keypoint | 1x | 2 | 5.7 | 0.3771 | 9.4 | 0.10941 | 53.7 | 64.3 | 9981060 |
We provided pre-trained models for selected FBNet models.
- All the models are trained from scratched with BN using the training schedule specified below.
- Evaluation is performed on a single NVIDIA V100 GPU with
MODEL.RPN.POST_NMS_TOP_N_TEST
set to200
.
The following inference time is reported:
- inference total batch=8: Total inference time including data loading, model inference and pre/post preprocessing using 8 images per batch.
- inference model batch=8: Model inference time only and using 8 images per batch.
- inference model batch=1: Model inference time only and using 1 image per batch.
- inferenee caffe2 batch=1: Model inference time for the model in Caffe2 format using 1 image per batch. The Caffe2 models fused the BN to Conv and purely run on C++/CUDA by using Caffe2 ops for rpn/detection post processing.
The pre-trained models are available in the link in the model id.
backbone | type | resolution | lr sched | im / gpu | train mem(GB) | train time (s/iter) | total train time (hr) | inference total batch=8 (s/im) | inference model batch=8 (s/im) | inference model batch=1 (s/im) | inference caffe2 batch=1 (s/im) | box AP | mask AP | model id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R-50-C4 (reference) | Fast | 800 | 1x | 1 | 5.8 | 0.4036 | 20.2 | 0.0875 | 0.0793 | 0.0831 | 0.0625 | 34.4 | - | f35857197 |
fbnet_chamv1a | Fast | 600 | 0.75x | 12 | 13.6 | 0.5444 | 20.5 | 0.0315 | 0.0260 | 0.0376 | 0.0188 | 33.5 | - | f100940543 |
fbnet_default | Fast | 600 | 0.5x | 16 | 11.1 | 0.4872 | 12.5 | 0.0316 | 0.0250 | 0.0297 | 0.0130 | 28.2 | - | f101086388 |
R-50-C4 (reference) | Mask | 800 | 1x | 1 | 5.8 | 0.452 | 22.6 | 0.0918 | 0.0848 | 0.0844 | - | 35.2 | 31.0 | f35858791 |
fbnet_xirb16d | Mask | 600 | 0.5x | 16 | 13.4 | 1.1732 | 29 | 0.0386 | 0.0319 | 0.0356 | - | 30.7 | 26.9 | f101086394 |
fbnet_default | Mask | 600 | 0.5x | 16 | 13.0 | 0.9036 | 23.0 | 0.0327 | 0.0269 | 0.0385 | - | 29.0 | 26.1 | f101086385 |
In the following section, we compare our implementation with Detectron and mmdetection. The same remarks from mmdetection about different hardware applies here.
The numbers here are in seconds / iteration. The lower, the better.
type | Detectron (P100) | mmdetection (V100) | maskrcnn_benchmark (V100) |
---|---|---|---|
Faster R-CNN R-50 C4 | 0.566 | - | 0.4036 |
Faster R-CNN R-50 FPN | 0.544 | 0.554 | 0.3530 |
Faster R-CNN R-101 FPN | 0.647 | - | 0.4591 |
Faster R-CNN X-101-32x8d FPN | 0.799 | - | 0.7007 |
Mask R-CNN R-50 C4 | 0.620 | - | 0.4520 |
Mask R-CNN R-50 FPN | 0.889 | 0.690 | 0.4536 |
Mask R-CNN R-101 FPN | 1.008 | - | 0.5665 |
Mask R-CNN X-101-32x8d FPN | 0.961 | - | 0.7562 |
The lower, the better
type | Detectron (P100) | mmdetection (V100) | maskrcnn_benchmark (V100) |
---|---|---|---|
Faster R-CNN R-50 C4 | 6.3 | - | 5.8 |
Faster R-CNN R-50 FPN | 7.2 | 4.9 | 4.4 |
Faster R-CNN R-101 FPN | 8.9 | - | 7.1 |
Faster R-CNN X-101-32x8d FPN | 7.0 | - | 7.6 |
Mask R-CNN R-50 C4 | 6.6 | - | 5.8 |
Mask R-CNN R-50 FPN | 8.6 | 5.9 | 5.2 |
Mask R-CNN R-101 FPN | 10.2 | - | 7.9 |
Mask R-CNN X-101-32x8d FPN | 7.7 | - | 7.8 |
The higher, the better
type | Detectron (P100) | mmdetection (V100) | maskrcnn_benchmark (V100) |
---|---|---|---|
Faster R-CNN R-50 C4 | 34.8 | - | 34.8 |
Faster R-CNN R-50 FPN | 36.7 | 36.7 | 36.8 |
Faster R-CNN R-101 FPN | 39.4 | - | 39.1 |
Faster R-CNN X-101-32x8d FPN | 41.3 | - | 41.2 |
Mask R-CNN R-50 C4 | 35.8 & 31.4 | - | 35.6 & 31.5 |
Mask R-CNN R-50 FPN | 37.7 & 33.9 | 37.5 & 34.4 | 37.8 & 34.2 |
Mask R-CNN R-101 FPN | 40.0 & 35.9 | - | 40.1 & 36.1 |
Mask R-CNN X-101-32x8d FPN | 42.1 & 37.3 | - | 42.2 & 37.8 |