MP ZeRO++ by HeyangQin · Pull Request #3954 · deepspeedai/DeepSpeed

HeyangQin · 2023-07-13T19:34:40Z

As a follow-up and extension of the ZeRO++ release, the mixed precision ZeRO++ PR grants users the option to permanently keep the non-trainable weights quantized, which is very useful for LoRA. Compared with the standard weights quantization in ZeRO++, it allows for reduced memory usage and even better throughput. Many thanks to Sam for helping with this implementation.

* fix conv_flops_compute when padding is a str when stride=1 * fix error * change type of paddings to tuple * fix padding calculation * apply formatting check --------- Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Cheng Li <pistasable@gmail.com>

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format

Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Ethan Doe <yidoe@microsoft.com> Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Co-authored-by: HeyangQin <heyangqin@microsoft.com> Co-authored-by: GuanhuaWang <alexwgh333@gmail.com> Co-authored-by: cmikeh2 <connorholmes@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com>

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * add ZeRO++ Japanese blog * add links --------- Co-authored-by: HeyangQin <heyangqin@microsoft.com> Co-authored-by: Conglong Li <conglong.li@gmail.com>

* fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

* Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

…cision_lora

HeyangQin and others added 30 commits June 21, 2023 11:51

zero++ tutorial PR (#3783)

df1859d

fix interpolate flops compute (#3782)

a8c182a

use Flops Profiler to test model.generate() (#2515)

c4c442f

* Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Cheng Li <pistasable@gmail.com>

revert PR #3611 (#3786)

fc9e1ee

bump to 0.9.6

40045dc

ZeRO++ chinese blog (#3793)

49a0a1b

* zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format

remove staging trigger (#3792)

2c62cb4

adding zero++ to navigation panel of deepspeed.ai (#3796)

01b843a

Bug Fixes for autotuner and flops profiler (#1880)

b4a2c0a

* fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Missing strided copy for gated MLP (#3788)

b7e1010

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

Requires grad checking. (#3789)

e5b1ead

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

bump to 0.10.0

9c756cf

Fix Bug in transform.cu (#3534)

a204edc

* Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

bug fix: triton importing error (#3799)

f6e2e38

Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>

Merge branch 'master' of github.com:microsoft/DeepSpeed

c1a7d3c

Merge branch 'master' of github.com:microsoft/DeepSpeed

65ed548

Merge branch 'master' of github.com:microsoft/DeepSpeed

d7ac329

Merge branch 'master' of github.com:microsoft/DeepSpeed

83f1102

Merge branch 'master' of github.com:microsoft/DeepSpeed

16555b2

Merge branch 'master' of github.com:microsoft/DeepSpeed

9d7b654

init commit for mixed precision lora

2efb73d

fix format

1147885

patch _allgather_params & minor fixes

1bec51f

make sure initial quantization are finished

5b3c460

make sure dequantization is finished

ec1f154

skip quantization for small parameters

9d53168

HeyangQin added 10 commits July 13, 2023 19:42

remove unused async_op

b3ad425

Merge branch 'HeyangQin/mixed_precision_lora' of https://github.com/m…

7b2b6a4

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

lazy load of quantizer kernels

a06c564

add mixed precision lora tutorial

94cf3c4

Merge branch 'master' into HeyangQin/mixed_precision_lora

ce96d9a

cleanup mics

b1cb597

cleanup mics

3470949

Merge branch 'HeyangQin/mixed_precision_lora' of https://github.com/m…

e0e8cf4

…icrosoft/DeepSpeed into HeyangQin/mixed_precision_lora

replace get_accelerator().current_device()

c25cf6b

Merge remote-tracking branch 'origin/master' into HeyangQin/mixed_pre…

aa4f28a

…cision_lora

HeyangQin mentioned this pull request Aug 16, 2023

Mixed Precision ZeRO++ deepspeedai/DeepSpeedExamples#689

Merged

HeyangQin added 3 commits August 17, 2023 04:05

Merge remote-tracking branch 'origin/master' into HeyangQin/mixed_pre…

f7cb549

…cision_lora

add kwargs to mics

d501309

fix format

b5a41fa

HeyangQin changed the title ~~Mixed precision LoRA release~~ Mixed precision ZeRO++ release Aug 17, 2023

HeyangQin changed the title ~~Mixed precision ZeRO++ release~~ MP ZeRO++ Aug 17, 2023

HeyangQin added 2 commits August 17, 2023 18:55

seperate code and tutorial

74c2760

Merge branch 'master' into HeyangQin/mixed_precision_lora

9f68cda

awan-10 approved these changes Aug 18, 2023

View reviewed changes

HeyangQin enabled auto-merge August 18, 2023 18:54

awan-10 and others added 4 commits August 18, 2023 16:39

Merge branch 'master' into HeyangQin/mixed_precision_lora

f802011

Merge branch 'master' into HeyangQin/mixed_precision_lora

a6bd454

Merge branch 'master' into HeyangQin/mixed_precision_lora

3d527b2

fix _all_gather in zero3

9e277ba

HeyangQin added this pull request to the merge queue Aug 20, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 21, 2023

HeyangQin added this pull request to the merge queue Aug 21, 2023

Merged via the queue into master with commit 7711bdb Aug 21, 2023

jeffra deleted the HeyangQin/mixed_precision_lora branch August 31, 2023 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MP ZeRO++ #3954

MP ZeRO++ #3954
HeyangQin merged 51 commits intomasterfrom
HeyangQin/mixed_precision_lora

HeyangQin commented Jul 13, 2023 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

HeyangQin commented Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

HeyangQin commented Jul 13, 2023 •

edited

Loading