-
Notifications
You must be signed in to change notification settings - Fork 1k
Home
Stella Biderman edited this page Jan 30, 2021
·
4 revisions
Welcome to the gpt-neox wiki!
The purpose of this wiki is to organize information about all the different terminology and ideas floating around in the DeepSpeed papers, how they connect to each other, what benefits they provide, and why we care about them.
Each item on this list should have its own page.
- ZeRO
- ZeRO Stage 1 vs 2 vs 3
- ZeRO Offload
- Pipeline Parallelism
- Kernel Optimization
- Gradient Clipping
- Progressive Layer Dropping
- Sparse Attention
- Model Checkpointing
- Activation Checkpointing
- Adam
- 1-Bit Adam
- TCP
- Infiniband
- PCIE
- NVLINK
- MPI
- NCCL