Distributed Modes Speed Benchmark #3436
Unanswered
briankosw
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is there a benchmark on the training speed of each of the distributed modes? I think it'd be really nice to see how each of these modes perform, especially DDP, DDP_spawn, DDP2, and Horovod.
Another question:
Why does DDP perform better than DDP_spawn? It seems that DDP internally launches the same script with different environment variables (so analogous to
torch.distributed.launch
) while DDP_spawn spawns a bunch of subprocesses (so analogous totorch.multiprocessing.spawn
). I'm having a hard time understanding why DDP is advantageous compared to DDP_spawn, and I wanted more explanation on the limitations of DDP_spawn listed in Multi-GPU Training.Beta Was this translation helpful? Give feedback.
All reactions