Closed
Description
🐛 Bug
Running DDP on a devgpu with 4 GPUs with --nprocs_per_node=2
and --nnodes=2
does not work when the script uses LOCAL_RANK
to set the cuda device.
torchx run dist.ddp -j 2x2
Module (check all that applies):
-
torchx.spec
-
torchx.component
-
torchx.apps
-
torchx.runtime
-
torchx.cli
-
torchx.schedulers
-
torchx.pipelines
-
torchx.aws
-
torchx.examples
-
other
To Reproduce
See description above, easily repros with a training script:
if __name__ == "__main__":
torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
try running the above with
torchx run dist.ddp -j 2x2 main.py
Expected behavior
TorchX local scheduler should set CUDA_VISIBLE_DEVICE=0,1 on the first two workers, and CUDA_VISIBLE_DEVICE=2,3 on the next two workers.
Environment
- torchx version (e.g. 0.1.0rc1):
- Python version:
- OS (e.g., Linux):
- How you installed torchx (
conda
,pip
, source,docker
): - Docker image and tag (if using docker):
- Git commit (if installed from source):
- Execution environment (on-prem, AWS, GCP, Azure etc):
- Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
No labels