Why tree algorithms are specifically targeted at All-Reduce? #1473

jxh314 · 2024-10-09T13:23:49Z

I'm running nccl-test all-reduce between two nodes, and I've found that the tree algorithm performs much better than the ring algorithm. However, through reading the NCCL source code, I noticed that it seems only the all-reduce operation has a tree algorithm implementation. As far as I know, with the FSDP (Fully Sharded Data Parallelism) strategy, it achieves global all-reduce through a combination of all-gather and reduce-scatter. I confirmed this by setting NCCL_DEBUG_SUBSYS=COLL. So, why does it appear that only all-reduce has a tree algorithm implementation? Would using the ring algorithm when training models with the FSDP strategy result in underutilization of bandwidth?

The text was updated successfully, but these errors were encountered:

sjeaugey · 2024-10-09T14:01:53Z

why does it appear that only all-reduce has a tree algorithm implementation

Implementing a Tree algorithm for allgather/reducescatter is very hard, contrary to allreduce.
The Tree algorithm implements multiple reduce/broadcast operations, so it doesn't apply to allgather/reducescatter.

That being said, in 2.23 we introduced the PAT algorithms which are a variation of brucks algorithm reordering the schedule to keep the memory needs limited. They improve the performance of allgather/reducescatter at scale, but they are still a work in progress and not yet as low-latency as the Tree allreduce. They also only support one GPU per node at the moment, i.e. the intra-node part is not yet implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why tree algorithms are specifically targeted at All-Reduce? #1473

Why tree algorithms are specifically targeted at All-Reduce? #1473

jxh314 commented Oct 9, 2024

sjeaugey commented Oct 9, 2024

Why tree algorithms are specifically targeted at All-Reduce? #1473

Why tree algorithms are specifically targeted at All-Reduce? #1473

Comments

jxh314 commented Oct 9, 2024

sjeaugey commented Oct 9, 2024