TPU Object Detection and Segmentation Model Zoo
Model zoo provides a large collection of baselines and checkpoints for object detection, instance segmentation, and image classification.
Object Detection and Instance Segmentation
Common Settings and Notes
We provide models based on two detection frameworks, RetinaNet or Mask R-CNN , and three backbones, ResNet-FPN , ResNet-NAS-FPN , or SpineNet .
Models are all trained on COCO train2017 and evaluated on COCO val2017.
Training details:
Models finetuned from ImageNet pretrained checkpoints adopt the 36 epochs (~3x) schedule, where 1x is around 12 COCO epochs.
Most models trained from scratch adopt the 72 or 350 epochs schedule.
The default training data augmentation implements horizontal flipping and scale jittering with a random scale between [0.5, 2.0].
Unless noted, all models are trained with l2 weight regularization and ReLU activation.
We use batch size 256 and stepwise learning rate that decays at the last 30 and 10 epoch.
We use square image as input by resizing the long side of an image to the target size then padding the short side with zeros.
Inference latency :
Latency is measured on a V100/P100 GPU from inputs to raw outputs (without image pre-processing or post-processing, e.g. NMS).
TensorRT optimization is not implemented in all tests.
COCO Object Detection Baselines
RetinaNet (ImageNet pretrained)
Coming soon.
RetinaNet (Trained from scratch)
model
resolution
epochs
FLOPs (B)
params (M)
V100 / P100 lat (ms/im)
box AP
download
R50-FPN
640x640
350
97.0
34.0
23 / 37
40.4
ckpt | config
R101-FPN
1024x1024
350
326.3
53.1
55 / 95
43.9
ckpt | config
R152-FPN
1280x1280
350
630.5
68.7
100 / 167
45.2
ckpt | config
R50-NAS-FPN
640x640
72
140.6
60.3
29 / 48
37.3
N/A
R50-NAS-FPN
640x640
350
140.6
60.3
29 / 48
42.4
ckpt | config
SpineNet-49
640x640
72
85.4
28.5
24 / 38
37.7
N/A
SpineNet-49
640x640
350
85.4
28.5
24 /38
42.8
ckpt | config
SpineNet-49S
640x640
350
33.8
11.9
19 / 26
39.5
ckpt | config
SpineNet-96
1024x1024
350
265.4
43.0
53 / 87
46.7
ckpt | config
SpineNet-143
1280x1280
350
524.0
67.0
97 / 159
48.0
ckpt | config
SpineNet models trained with stochastic depth and swish activation for a longer shedule:
model
resolution
epochs
FLOPs (B)
params (M)
box AP
download
SpineNet-49S
640x640
500
33.8
11.9
41.5
ckpt | config
SpineNet-49
640x640
500
85.4
28.5
44.3
ckpt | config
SpineNet-96
1024x1024
500
265.4
43.0
48.5
ckpt | config
SpineNet-143
1280x1280
500
524.0
67.0
50.6
ckpt | config
SpineNet-190
1280x1280
400
1885.0
163.6
52.0
ckpt | config
Mobile RetinaNet (Trained from scratch)
model
resolution
epochs
FLOPs (B)
params (M)
box AP
download
SpineNetMB-49
384x384
600
1.0
2.34
28.6
ckpt | config
Instance Segmentation Baselines
Mask R-CNN (ImageNet pretrained)
Coming soon.
Mask R-CNN (Trained from scratch)
model
resolution
epochs
FLOPs (B)
params (M)
box AP
mask AP
download
SpineNet-49
640x640
350
215.7
40.8
42.8
37.8
ckpt | config
SpineNet-96
1024x1024
350
314.6
55.2
46.8
41.2
ckpt | config
SpineNet-143
1280x1280
350
498.4
79.2
48.7
42.6
ckpt | config
SpineNet-190 trained with stochastic depth and swish activation for a longer shedule:
model
resolution
epochs
FLOPs (B)
params (M)
box AP
mask AP
download
SpineNet-190
1536x1536
400
1685.7
168.2
52.0
45.9
ckpt | config
Common Settings and Notes
We provide ImageNet and iNaturalist-2017 pretrained checkpoints for ResNet and SpineNet models at various scales.
Training details:
All models are trained from scratch for 200 epochs with cosine learning rate decay and batch size 4096.
Unless noted, all models are trained with l2 weight regularization and ReLU activation.
model
resolution
epochs
FLOPs (B)
params (M)
Top-1
Top-5
download
ResNet-34
224x224
200
3.7
21.8
74.4
92.0
ckpt | config
ResNet-50
224x224
200
4.1
25.6
77.1
93.6
ckpt | config
ResNet-101
224x224
200
7.8
44.6
78.2
94.2
ckpt | config
ResNet-152
224x224
200
11.5
60.2
78.4
94.2
ckpt | config
SpineNet-49
224x224
200
3.5
22.1
77.0
93.3
ckpt | config
SpineNet-96
224x224
200
5.7
36.5
78.2
94.0
ckpt | config
SpineNet-143
224x224
200
9.1
60.5
79.0
94.4
ckpt | config
SpineNet models trained with stochastic depth, swish activation, and label smoothing:
model
resolution
epochs
FLOPs (B)
params (M)
Top-1
Top-5
download
SpineNet-49
224x224
200
3.5
22.1
78.1
94.0
ckpt | config
SpineNet-96
224x224
200
5.7
36.5
79.4
94.6
ckpt | config
SpineNet-143
224x224
200
9.1
60.5
80.1
95.0
ckpt | config
SpineNet-190
224x224
200
19.1
127.1
80.8
95.3
ckpt | config
iNaturalist-2017 Baselines
model
resolution
epochs
FLOPs (B)
params (M)
Top-1
Top-5
ResNet-34
224x224
200
3.7
23.9
54.1
76.7
ResNet-50
224x224
200
4.1
33.9
54.6
77.2
ResNet-101
224x224
200
7.8
52.9
57.0
79.3
ResNet-152
224x224
200
11.5
68.6
58.4
80.2
SpineNet-49
224x224
200
3.5
23.1
59.3
81.9
SpineNet-96
224x224
200
5.7
37.6
61.7
83.4
SpineNet-143
224x224
200
9.1
61.6
63.6
84.8
SpineNet models trained with stochastic depth, swish activation, and label smoothing:
model
resolution
epochs
FLOPs (B)
params (M)
Top-1
Top-5
SpineNet-49
224x224
200
3.5
23.1
63.3
85.1
SpineNet-96
224x224
200
5.7
37.6
64.7
85.9
SpineNet-143
224x224
200
9.1
61.6
66.7
87.1
SpineNet-190
224x224
200
19.1
129.2
67.6
87.4