Popular Computer Vision Model Benchmarks
Batch Size = 8, Image = 3 x 224 x 224 (IF NOTHING SPECIFIED / CPU USED)
Batch Size = 4, Image = 3 x 224 x 224
GPU USED --- Titan 1080Ti 12 GB
Model
Framework
Forward Pass
Backward Pass
Total Time
Inference
VGG16
Pytorch 0.4.1
0.0245 s
0.0606 s
0.0852 s
0.0234 s
Flux 0.6.8+
0.0287 s
0.0760 s
0.1047 s
0.0288 s
VGG16 BN
Pytorch 0.4.1
0.0271 s
0.0672 s
0.0943 s
0.0273 s
Flux 0.6.8+
0.0333 s
0.0818 s
0.1151 s
0.0327 s
VGG19
Pytorch 0.4.1
0.0281 s
0.0741 s
0.1021 s
0.0280 s
Flux 0.6.8+
0.0355 s
0.0923 s
0.1278 s
0.0356 s
VGG19 BN
Pytorch 0.4.1
0.0321 s
0.0812 s
0.1134 s
0.0325 s
Flux 0.6.8+
0.0377 s
0.0965 s
0.1342 s
0.0371 s
Resnet18
Pytorch 0.4.1
0.0064 s
0.0125 s
0.0190 s
0.0050 s
Flux 0.6.8+
0.0079 s
0.0218 s
0.0297 s
0.0079 s
Resnet34
Pytorch 0.4.1
0.0092 s
0.0216 s
0.0307 s
0.0092 s
Flux 0.6.8+
0.0137 s
0.0313 s
0.0450 s
0.0151 s
Resnet50
Pytorch 0.4.1
0.0155 s
0.0351 s
0.0506 s
0.0152 s
Flux 0.6.8+
0.0205 s
0.1795 s
0.2000 s
-
Resnet101
Pytorch 0.4.1
0.0297 s
0.0379 s
0.0676 s
0.0298 s
Flux 0.6.8+
0.0215 s
0.0616 s
0.0831 s
0.0208 s
Resnet152
Pytorch 0.4.1
0.0431 s
0.05337 s
0.0965 s
0.0429 s
Flux 0.6.8+
0.0308 s
0.0807 s
0.1115 s
0.0298 s
CPU USED --- Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Model
Framework
Forward Pass
Backward Pass
Total Time
Inference
VGG16
Pytorch 0.4.1
6.6024 s
9.4336 s
16.036 s
6.4216 s
Flux 0.6.8+
10.458 s
10.245 s
20.703 s
10.111 s
VGG16 BN
Pytorch 0.4.1
7.0793 s
9.0536 s
16.132 s
6.7909 s
Flux 0.6.8+
29.633 s
18.649 s
49.282 s
24.047 s
VGG19
Pytorch 0.4.1
8.3075 s
10.899 s
19.207 s
8.0593 s
Flux 0.6.8+
12.226 s
12.457 s
24.683 s
12.029 s
VGG19 BN
Pytorch 0.4.1
8.7794 s
12.739 s
21.519 s
8.4044 s
Flux 0.6.8+
28.518 s
21.464 s
49.982 s
22.649 s
Individual Layer Benchmarks
Conv3x3/1 = Conv2d, 3x3 Kernel, 1x1 Padding, 1x1 Stride
Conv5x5/1 = Conv2d, 5x5 Kernel, 2x2 Padding, 1x1 Stride
Conv3x3/2 = Conv2d, 3x3 Kernel, 1x1 Padding, 2x2 Stride
Conv5x5/2 = Conv2d, 5x5 Kernel, 2x2 Padding, 2x2 Stride
Dense = 1024 => 512
BatchNorm = BatchNorm2d
GPU USED --- Titan 1080Ti 12 GB
Layer
Framework
Forward Pass
Backward Pass
Total Time
Conv3x3/1
Pytorch 0.4.1
0.2312 ms
0.5359 ms
0.7736 ms
Flux 0.6.8+
0.1984 ms
0.7640 ms
0.9624 ms
Conv5x5/1
Pytorch 0.4.1
0.2667 ms
0.5345 ms
0.8299 ms
Flux 0.6.8+
0.2065 ms
0.8075 ms
1.014 ms
Conv3x3/2
Pytorch 0.4.1
0.1170 ms
0.2203 ms
0.3376 ms
Flux 0.6.8+
0.0927 ms
0.5988 ms
0.6915 ms
Conv5x5/2
Pytorch 0.4.1
0.1233 ms
0.2162 ms
0.3407 ms
Flux 0.6.8+
0.0941 ms
0.6515 ms
0.7456 ms
Dense
Pytorch 0.4.1
0.0887 ms
0.1523 ms
0.2411 ms
Flux 0.6.8+
0.0432 ms
0.2044 ms
0.2476 ms
BatchNorm
Pytorch 0.4.1
0.1096 ms
0.1999 ms
0.3095 ms
Flux 0.6.8+
0.2211 ms
0.2849 ms
0.5060 ms
To reproduce the benchmarks checkout Flux 0.6.8+
avik-pal/cudnn_batchnorm and CuArrays
master .
Since the Batchnorm GPU is broken for Flux 0.6.8+ master so we cannot perform the benchmarks using that.