Use Keras to define a model and train it with efficient tensorpack trainers.
Keras alone has various overhead. In particular, it is not efficient with large models. The article Towards Efficient Multi-GPU Training in Keras with TensorFlow has mentioned some of it.
Even on a single GPU, tensorpack can run 1.2~2x faster than the equivalent Keras code. The gap becomes larger when you scale to multiple GPUs. Tensorpack and horovod are the only two tools I know that can scale the training of a large Keras model.
There are two flavors where you can use a Keras model inside tensorpack:
-
Write the tower function similar to a standard tensorpack program, but mix some Keras layers in between. See mnist-keras.py on how to do this. It does not support all tensorpack trainers, and can be brittle due to incompatibilities between Keras and tensorpack.
-
The entire model to train is a Keras model (and there will be no
ModelDesc
, etc). See mnist-keras-v2.py.
imagenet-resnet-keras.py: reproduce exactly the same setting of tensorpack ResNet example on ImageNet. It has:
- ResNet-50 model modified from keras.applications. (We put stride on 3x3 conv in each bottleneck, which is different from certain other implementations).
- Multi-GPU data-parallel training and validation which scales
- Finished 100 epochs in 19 hours on 8 V100s, with >90% GPU utilization.
- Still slightly slower than native tensorpack examples.
- Good accuracy (same as tensorpack ResNet example)
Keras does not respect variable scopes or variable collections, which contradicts with tensorpack trainers. Therefore Keras support is experimental and unofficial.
These simple examples can run within tensorpack smoothly, but note that a complicated model or a future version of Keras may break them.