This mini-benchmark compares the speed of several deep learning frameworks on the VGG-16 network architecture.
Contributions welcome!
The table contains training times in milliseconds per minibatch on the same VGG16 network in different frameworks. (Less is faster). Minibatch size is 16. The time is for a complete SGD step including parameter updates, not just the forward+backward time.
Framework | V100 | GTX 1080 | Maxwell Titan X | K80 | K520 |
---|---|---|---|---|---|
MXNet | 81.22 | N/A | 324.63 | 1247.47 | OOM |
TensorFlow | 88.03 | N/A | 332.15 | 1057.28 | 2290.58 |
TensorFlow (slim) | 176.14 | N/A | 370.89 | 1126.70 | 2488.51 |
Keras (TensorFlow) | 101.82 | 287.85 | 359.89 | 1020.97 | OOM |
Keras (Theano) | N/A | 409.95 | 317.30 | 1141.79 | 2445.22 |
Neon | N/A | 164.53 | 207.41 | N/A | N/A |
Caffe | N/A | 244.44 | 311.06 | 787.81 | OOM |
Torch (1) | N/A | 232.55 | 273.54 | N/A | N/A |
N/A - test not ran
OOM - test ran but failed due to running out of memory (on the K520 with only 4GB memory)
(1) The Torch benchmark is from https://github.com/jcjohnson/cnn-benchmarks (it is not included in this repo).
bash run.sh
Note: this will back up and then restore your ~/.keras/keras.json
Caffe should be built inside caffe/ in the current directory (or a symlink).
Neon should be built anywhere and (just for the Neon test) the built Neon virtualenv should be activated.