The system launches a deep learning job in one or more Docker containers. A Docker images is required in advance. The system provides a base Docker images with HDFS, CUDA and cuDNN support, based on which users can build their own custom Docker images.
To build a base Docker image, for example Dockerfile.build.base, run:
docker build -f Dockerfiles/Dockerfile.build.base -t pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04 Dockerfiles/
Then a custom docker image can be built based on it by adding FROM pai.build.base:hadoop2.7.2-cuda8.0-cudnn6-devel-ubuntu16.04
in the Dockerfile.
As an example, we customize a TensorFlow Docker image using Dockerfile.run.tensorflow:
docker build -f Dockerfiles/Dockerfile.run.tensorflow -t pai.run.tensorflow Dockerfiles/
Next, the built image is pushed to a docker registry for every node in the system to access that image:
docker tag pai.run.tensorflow your_docker_registry/pai.run.tensorflow
docker push your_docker_registry/pai.run.tensorflow
And the image is ready to serve. Note that above script assume the docker registry is deployed locally. Actual script can vary depending on the configuration of Docker registry.