benchmarks/MaskRCNN at master · tensorpack/benchmarks

README.md

This is a benchmark of tensorpack's Mask R-CNN implementation and the popular Matterport Mask R-CNN implementation.

Use the standard hyperparameters used by Detectron, except that total batch size is set to 8.
export TF_CUDNN_USE_AUTOTUNE=0 to avoid CuDNN warmup time.
Measure speed using "images per second", in the second or later epochs.

Using TRAINER=replicated, the speed is about 42 img/s:

./train.py  --config DATA.BASEDIR=~/data/coco DATA.NUM_WORKERS=20 MODE_FPN=True --load ImageNet-R50-AlignPadding.npz

Using TRAINER=horovod, the speed is about 50 img/s:

mpirun -np 8 ./train.py --config DATA.BASEDIR=~/data/coco MODE_FPN=True TRAINER=horovod --load ImageNet-R50-AlignPadding.npz

Apply maskrcnn.patch to make it use the same hyperparameters. Then, run command:

python coco.py train --dataset=~/data/coco/ --model=imagenet

It trains at 0.77 s / step, aka 10 img/s. If using 2 images per GPU, it can improve to 12 img/s.

Mask R-CNN is a complicated system and there could be many implementation differences. The above diff only makes the two systems perform roughly the same training.
The training time of a R-CNN typically slowly decreases as the training progresses. In this experiment we only look at the training time of the first couple thousand iterations. It cannot be extrapolated to compute the total training time of the model.
Tensorpack's Mask R-CNN is not only fast, but also more accurate.