-
Notifications
You must be signed in to change notification settings - Fork 130
Distributed Computing Overview
Albert Zeyer edited this page Nov 29, 2023
·
11 revisions
This is supposed to provide an overview over different techniques. Some of them already supported, or partially supported, or not yet implemented (but all could be done).
- Distributed PyTorch
- RETURNN multi-GPU training (using Horovod)
- Distributed TensorFlow
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding: paper
- GSPMD: General and Scalable Parallelization for Neural Networks: blog, paper.
- Mesh TensorFlow
- TensorFlow DTensor
- Pathways: Asynchronous Distributed Dataflow for ML: blog, paper. used by PaLM. closed source