This repo is modified from https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile and reference to https://github.com/jeffra/deepspeed-kdd20
git clone https://github.com/hughpu/deepspeed-docker.git \
&& cd deepspeed-docker \
&& docker build -t deepspeed:latest -f Dockerfile .git clone https://github.com/hughpu/deepspeed-docker.git \
&& cd deepspeed-docker \
&& docker build --build-arg IN_CHINA="1" -t deepspeed:latest -f Dockerfile .git clone https://github.com/hughpu/deepspeed-docker.git \
&& cd deepspeed-docker \
&& docker build --network host --build-arg https_proxy=http://127.0.0.1:7890 --build-arg IN_CHINA="1" --build-arg no_proxy="127.0.0.1,localhost,mirrors.aliyun.com" -f Dockerfile -t deepspeed:latest .after you build the image deepspeed:latest following above instructions. you can start the container with,
bash ./start_deepspeed_container.shThe script start_deepspeed_container.sh got 2 volume mapping,
- data path: which is set to local
$HOME/deepspeedand can be found inside container at/data. This is going to be used as workspace to put the training, inference scripts, datasets as well as output checkpoint and logs. - ssh path: which is set to local
$HOME/deepspeed_sshand can be found at/home/deepspeed/.sshinside container. Please add all the public keyid_rsa.pubfrom every node machine toauthorized_keysfile under this path of each node, to enable no password authentication between nodes, which is required by deepspeed. Please config the connection to all nodes with the fileconfigunder this path as well, the name of these config can be put in thehostfileneeded by deepspeed.