PyTorch GTC Workshop Repo

This repo contains the PyTorch Distributed Deep Learning workshop contents to run on the Nvidia DLI platform. It will simulate two host environment with 2 GPUs per host.

Docker Instructions

Use Dockerfile to build a docker image with docker build --no-cache -t ptgtc ..

Create private docker network for network connection across the hosts

docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 backend

Launch two docker containers (each simulating a host) using the docker image created.

Node 1:

docker run -d --name node1 --network=backend -p 8000:8888 --shm-size=1g -e NVIDIA_VISIBLE_DEVICES=0,1 --runtime=nvidia ptgtc

Node 2:

docker run -d --name node2 --network=backend -p 9000:8888 --shm-size=1g -e NVIDIA_VISIBLE_DEVICES=2,3 --runtime=nvidia ptgtc

Once the containers are running, visit the content in your browser at localhost:8000 and localhost:9000

Open a terminal window inside the juperlab browser window above and verify following commands are running.

ping node2

lsof -i -P -n to see the list of all the open ports

For testing the distributed data parallel across the two hosts, follow below sequence of steps:

On node1, open two terminal windows
From first terminal window run command export NCCL_DEBUG=info
From first terminal window run command python ddp_tutorial.py 0 0
From second terminal window run command python ddp_tutorial.py 1 1
On node2, open two terminal windows
From first terminal window run command export NCCL_DEBUG=info
From first terminal window run command python ddp_tutorial.py 2 0
From second terminal window run command python ddp_tutorial.py 3 1

The DDP training job should run and complete.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
content		content
Dockerfile		Dockerfile
README.md		README.md
entrypoint.sh		entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch GTC Workshop Repo

Docker Instructions

About

Releases

Packages

Languages

NVDLI/ptgtc

Folders and files

Latest commit

History

Repository files navigation

PyTorch GTC Workshop Repo

Docker Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages