Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
d190f1c
test sparse self_attn fix
Mar 11, 2021
18a26f3
[WarmupDecayLR] fix log(0) & 1/log(1) bugs (#772)
stas00 Mar 12, 2021
35fd7cc
bump to v0.3.12
jeffra Mar 12, 2021
458ff02
Bug fix: Remove client optimizer param_group list item that does not …
cli99 Mar 12, 2021
73d762c
[doc] pipeline doc typos/improvements (#659)
stas00 Mar 14, 2021
4601885
Samyamr/inference hook fix (#851)
samyam Mar 15, 2021
a75d971
ZeRO Stage 2: Clear reduced gradients (#856)
tjruwase Mar 15, 2021
24335d4
[runner/launch] propagate the error (#854)
stas00 Mar 16, 2021
547d1c5
docs: minor spelling tweaks (#858)
brettkoonce Mar 16, 2021
871f304
Allow args to be optional in deepspeed.initialize (#825)
jeffra Mar 16, 2021
fa87a73
Fix ZeRO3 save_checkpoint (#857)
tjruwase Mar 16, 2021
7bcd72a
Make config objects json serializable (#862)
tjruwase Mar 16, 2021
12a53b4
bump version 0.3.13
jeffra Mar 16, 2021
68c8481
1-bit Adam v2 (#817)
conglongli Mar 16, 2021
10c0bea
consistent checkpoint filenaming (#865)
stas00 Mar 18, 2021
9e9f8cb
[doc] launcher (#868)
stas00 Mar 18, 2021
22d5a1f
[doc] pipeline (#888)
stas00 Mar 24, 2021
7f03282
[debug utils] see_memory_usage fixes (#890)
stas00 Mar 25, 2021
7531c6b
full fp32 weights reconstruction for zero 2+3 (#892)
stas00 Mar 26, 2021
39013dd
save_fp16_model consolidated for zero3 (#893)
stas00 Mar 27, 2021
7fcc891
Fix zero stage2 cpu_offload when some model trainable parameters skip…
ghosthamlet Mar 27, 2021
b4ac3b6
mlperf attn initial commit
Mar 29, 2021
af2d8fc
update kramdown (#901)
jeffra Mar 30, 2021
23ff6cb
update backward api doc (#903)
jeffra Mar 30, 2021
c042264
Bump kramdown from 2.3.0 to 2.3.1 in /docs (#905)
dependabot[bot] Mar 30, 2021
8c9e16e
We're hiring! + integration posts
jeffra Mar 31, 2021
c6b497d
[website] We're hiring! + integration posts
jeffra Mar 31, 2021
c814abd
[website] we're hiring!
jeffra Mar 31, 2021
5d721e0
zero.Init() clarification (#880)
stas00 Apr 1, 2021
8db4fdf
disable pipe test (#915)
jeffra Apr 2, 2021
ab5534f
Add link to AML examples. (#916)
awan-10 Apr 2, 2021
c334c85
add inference_batch fn
Apr 6, 2021
ce14cf1
Add space in help string (#926)
tma15 Apr 7, 2021
b5f56b2
Fix for fragmented linear inputs in ZeRO 3 Linear layers where reshap…
samyam Apr 7, 2021
6d94afb
[zero3] GatheredParameters can now handle a list of params (#884)
stas00 Apr 7, 2021
c79184e
fix cpu_adam memory leak on deepspeed re-use in the same process (#896)
stas00 Apr 7, 2021
a128f34
[benchmarks] flatten/unflatten benchmarks (#919)
stas00 Apr 7, 2021
5ca86ae
improved readability + typos (#895)
stas00 Apr 7, 2021
f19cf67
[zero doc] fix misspelled param (#878)
stas00 Apr 7, 2021
7b46d11
Samyamr/stage 3 skip modules without parameters (#867)
samyam Apr 7, 2021
3169929
docs (#909)
stas00 Apr 7, 2021
e721cb6
Supporting different hidden dimensions for transformer kernels-v2 (#934)
RezaYazdaniAminabadi Apr 8, 2021
dba52bc
Pull changes from DeepSpeed
Apr 8, 2021
d9641fd
Pull changes from DeepSpeed
Apr 8, 2021
dbc3b13
Pull changes from DeepSpeed
Apr 8, 2021
0f5faf9
Pull changes from DeepSpeed
Apr 8, 2021
f6fc1af
Pull changes from DeepSpeed
Apr 8, 2021
03371ea
Pull changes from DeepSpeed
Apr 8, 2021
f90dc47
cleanup, reinstantiate sending of logits / layer_past
Apr 8, 2021
23d0f78
cleanup, reinstantiate sending of logits / layer_past
Apr 8, 2021
0cd9abf
bump to 0.3.14
jeffra Apr 8, 2021
112ebff
add pypi badge
jeffra Apr 8, 2021
e6999eb
Delete check of pdsh (#941)
tma15 Apr 13, 2021
adac058
fix double linear override; spelling (#954)
stas00 Apr 14, 2021
c87118b
[config] turn exponential notation back on for config dump (#955)
stas00 Apr 14, 2021
7003d44
document how to override ~/.cache/torch_extensions (#959)
stas00 Apr 14, 2021
8b8ed2a
[zero] faster flatten/unflatten (cpp version) (#910)
stas00 Apr 14, 2021
c83e49f
update lr scheduler doc for doing per step or epoch update (#913)
cli99 Apr 14, 2021
2805c39
Fix ZeRO-3 UnboundLocalError (#968)
tjruwase Apr 16, 2021
0d4a54a
ZeRO-Infinity (#976)
jeffra Apr 19, 2021
72a30c1
revert zero-inf change to launcher
jeffra Apr 19, 2021
598e50f
[docs] zero-inf updates
jeffra Apr 19, 2021
2c2a7f3
bump to 0.3.15
jeffra Apr 19, 2021
3c47d09
ZeRO-Infinity tutorial additions (#978)
Apr 19, 2021
1a74195
[docs] add ZeRO-Inf news items
jeffra Apr 19, 2021
5016e93
refactor
Apr 19, 2021
11279ae
ZeRO-Infinity docs (#979)
Apr 19, 2021
5f570bb
[docs] zero-inf updates
jeffra Apr 19, 2021
fbece50
assert no Z2/Z3 with pipeline and fix some docs links (#980)
Apr 19, 2021
9e0dab4
add option to force multi-node launcher mode (#977)
jeffra Apr 20, 2021
3525102
[ZeRO Infinity] Allow Init to take a dict for the deepspeed config (…
Apr 20, 2021
835b4c8
make bold+italic work without escaping _ (#775)
stas00 Apr 20, 2021
eecef30
remove debug prints: (#986)
stas00 Apr 21, 2021
67a48aa
1-bit LAMB optimizer (#970)
conglongli Apr 21, 2021
894f21d
Use odd shape tensor to represent parameter data in partitioned state…
cli99 Apr 21, 2021
0b80ad0
Make reduce scatter optional for ZeRO-1 as workaround (#971)
tjruwase Apr 21, 2021
669028f
Fix all Pipeline Module Parameters being sent to cuda:0 (#687)
sdtblck Apr 21, 2021
bf5487b
Merge branch 'master' of git://github.com/microsoft/DeepSpeed into sa…
Apr 22, 2021
7c023f2
remove communicate overflow (already in utils.CheckOverflow)
Apr 22, 2021
8cbd6aa
Merge branch 'main' into sampling
Apr 22, 2021
aab5226
Merge branch 'main' into sampling
sdtblck Apr 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DeepSpeedExamples
Submodule DeepSpeedExamples added at 78d69c
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
[![PyPI version](https://badge.fury.io/py/deepspeed.svg)](https://pypi.org/project/deepspeed/)
[![Documentation Status](https://readthedocs.org/projects/deepspeed/badge/?version=latest)](https://deepspeed.readthedocs.io/en/latest/?badge=latest)
[![License MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/Microsoft/DeepSpeed/blob/master/LICENSE)
[![Docker Pulls](https://img.shields.io/docker/pulls/deepspeed/deepspeed)](https://hub.docker.com/r/deepspeed/deepspeed)
[![Downloads](https://pepy.tech/badge/deepspeed/month)](https://pepy.tech/project/deepspeed)

### 03/2021: DeepSpeed is hiring! Come join us: [SDE 2](https://careers.microsoft.com/us/en/job/1013160/Software-Engineer-2), [Sr. SDE](https://careers.microsoft.com/us/en/job/1017151/Senior-Software-Engineer), [Sr. Researcher](https://careers.microsoft.com/us/en/job/1016440/Senior-Researcher)

Expand All @@ -17,7 +17,7 @@ DeepSpeed delivers extreme-scale model training for everyone, from data scientis
* Extreme scale: Using current generation of GPU clusters with hundreds of devices, 3D parallelism of DeepSpeed can efficiently train deep learning models with trillions of parameters.
* Extremely memory efficient: With just a single GPU, ZeRO-Offload of DeepSpeed can train models with over 10B parameters, 10x bigger than the state of arts, democratizing multi-billion-parameter model training such that many deep learning scientists can explore bigger and better models.
* Extremely long sequence length: Sparse attention of DeepSpeed powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution comparing with dense transformers.
* Extremely communication efficient: 3D parallelism improves communication efficiency allows users to train multi-billion-parameter models 2–7x faster on clusters with limited network bandwidth. 1-bit Adam reduces communication volume by up to 5x while achieving similar convergence efficiency to Adam, allowing for scaling to different types of GPU clusters and networks.
* Extremely communication efficient: 3D parallelism improves communication efficiency allows users to train multi-billion-parameter models 2–7x faster on clusters with limited network bandwidth. 1-bit Adam/1-bit LAMB reduce communication volume by up to 5x while achieving similar convergence efficiency to Adam/LAMB, allowing for scaling to different types of GPU clusters and networks.

Early adopters of DeepSpeed have already produced
a language model (LM) with over 17B parameters called
Expand All @@ -33,6 +33,9 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)


# News
* [2021/04/20] [1-bit LAMB: up to 4.6x less communication and 2.8x faster training, together with LAMB's convergence speed at large batch sizes](https://www.deepspeed.ai/tutorials/onebit-lamb/)
* [2021/04/19] [ZeRO-Infinity unlocks unprecedented model scale for deep learning training](https://www.microsoft.com/en-us/research/blog/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training/)
* [Tutorial on how to use different stages of ZeRO](https://www.deepspeed.ai/tutorials/zero/)
* [2021/04/01] [[DeepSpeed on AzureML] Transformers and CIFAR examples are now available on AzureML GitHub](https://github.com/Azure/azureml-examples/tree/main/workflows/train/deepspeed)
* [2021/03/30] [[PyTorch Lightning Blog] Accessible Multi-Billion Parameter Model Training with PyTorch Lightning + DeepSpeed](https://medium.com/pytorch-lightning/accessible-multi-billion-parameter-model-training-with-pytorch-lightning-deepspeed-c9333ac3bb59)
* [2021/03/16] [1-bit Adam v2: NCCL-based implementation and more](https://www.deepspeed.ai/tutorials/onebit-adam/)
Expand All @@ -41,10 +44,6 @@ information [here](https://innovation.microsoft.com/en-us/exploring-ai-at-scale)
* [2020/11/12] [Simplified install, JIT compiled ops, PyPI releases, and reduced dependencies](#installation)
* [2020/11/10] [Efficient and robust compressed training through progressive layer dropping](https://www.deepspeed.ai/news/2020/10/28/progressive-layer-dropping-news.html)
* [2020/09/10] [DeepSpeed v0.3: Extreme-scale model training for everyone](https://www.microsoft.com/en-us/research/blog/deepspeed-extreme-scale-model-training-for-everyone/)
* [Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention](https://www.deepspeed.ai/news/2020/09/08/sparse-attention-news.html)
* [Training a trillion parameters with pipeline parallelism](https://www.deepspeed.ai/news/2020/09/08/pipeline-parallelism.html)
* [Up to 5x less communication and 3.4x faster training through 1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-news.html)
* [10x bigger model training on a single GPU with ZeRO-Offload](https://www.deepspeed.ai/news/2020/09/08/ZeRO-Offload.html)


# Table of Contents
Expand Down Expand Up @@ -121,7 +120,7 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage.
* Memory- and compute-efficient sparse kernels
* Support 10x longer sequences than dense
* Flexible support to different sparse structures
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html)
* [1-bit Adam](https://www.deepspeed.ai/news/2020/09/08/onebit-adam-blog-post.html) and [1-bit LAMB](https://www.deepspeed.ai/tutorials/onebit-lamb/)
* Custom communication collective
* Up to 5x communication volume saving
* [Additional Memory and Bandwidth Optimizations](https://www.deepspeed.ai/features/#additional-memory-and-bandwidth-optimizations)
Expand Down Expand Up @@ -193,6 +192,8 @@ Conduct](https://opensource.microsoft.com/codeofconduct/). For more information
3. Minjia Zhang, Yuxiong He. (2020) Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. [arXiv:2010.13369](https://arxiv.org/abs/2010.13369) and [NeurIPS 2020](https://proceedings.neurips.cc/paper/2020/hash/a1140a3d0df1c81e24ae954d935e8926-Abstract.html).
4. Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He. (2021) ZeRO-Offload: Democratizing Billion-Scale Model Training. [arXiv:2101.06840](https://arxiv.org/abs/2101.06840).
5. Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He. (2021) 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. [arXiv:2102.02888](https://arxiv.org/abs/2102.02888).
6. Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He. (2021) ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. [arXiv:2104.07857](https://arxiv.org/abs/2104.07857).
7. Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He. (2021) 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. [arXiv:2104.06069](https://arxiv.org/abs/2104.06069).

# Videos
1. DeepSpeed KDD 2020 Tutorial
Expand Down
8 changes: 8 additions & 0 deletions csrc/adam/cpu_adam.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -672,11 +672,19 @@ int ds_adam_step_plus_copy(int optimizer_id,
return 0;
}

int destroy_adam_optimizer(int optimizer_id)
{
s_optimizers.erase(optimizer_id);

return 0;
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m)
{
m.def("adam_update", &ds_adam_step, "DeepSpeed CPU Adam update (C++)");
m.def("adam_update_copy",
&ds_adam_step_plus_copy,
"DeepSpeed CPU Adam update and param copy (C++)");
m.def("create_adam", &create_adam_optimizer, "DeepSpeed CPU Adam (C++)");
m.def("destroy_adam", &destroy_adam_optimizer, "DeepSpeed CPU Adam destroy (C++)");
}
Loading