Distributed Modes Speed Benchmark #3436

briankosw · 2020-09-10T05:44:16Z

briankosw
Sep 10, 2020

Is there a benchmark on the training speed of each of the distributed modes? I think it'd be really nice to see how each of these modes perform, especially DDP, DDP_spawn, DDP2, and Horovod.

Another question:
Why does DDP perform better than DDP_spawn? It seems that DDP internally launches the same script with different environment variables (so analogous to torch.distributed.launch) while DDP_spawn spawns a bunch of subprocesses (so analogous to torch.multiprocessing.spawn). I'm having a hard time understanding why DDP is advantageous compared to DDP_spawn, and I wanted more explanation on the limitations of DDP_spawn listed in Multi-GPU Training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Modes Speed Benchmark #3436

{{title}}

Replies: 0 comments

Select a reply

Distributed Modes Speed Benchmark #3436

briankosw Sep 10, 2020

Replies: 0 comments

briankosw
Sep 10, 2020