Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update deeperspeed final #46

Merged
merged 914 commits into from
Mar 9, 2023
Merged

Update deeperspeed final #46

merged 914 commits into from
Mar 9, 2023

Conversation

Quentin-Anthony
Copy link
Member

No description provided.

mrwyattii and others added 30 commits September 14, 2022 01:11
* add quant unit test

* add codeowner

* format fix

* fix undefined symbol: curandSetPseudoRandomGeneratorSeed

* modify ref fn name and add comment

* add comments

* add 4bit quant 16groups

* fix

* modify groups in ref code

* parameterize tensor shape

* single param

* detach tensor

* remove -lcurand flag

* add back -lcurand flag

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
MOE residual matmul unit tests

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
* Fix formatting

* Remove redundant variable
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
* mem access for quantize kernel

* format

* format fp32

* modify quant kernel

* modify quant kernel2

* modify format

* format

* fix comments in pytest

* fix comments in pytest

* format

* rerun

Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* Unify macro definitions and constants in a single file

* Conversion utility implementation.

* Fix reversion from formatting

* Bugfixes after testing with correct DeepSpeed

* Inline markers are available on both HIP + CUDA
Co-authored-by: Saeyeol Lee <sylee@si-anlaytics.ai>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
…2358)

Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
)

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
…t#2356)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* Collect error messages in results.csv

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* batch of refactored tests

* more test refactoring

* fp16 test refactor

* more refactors

* added DistributedFixture class

* applied DistributedFixture to first batch of tests as a trial

* added DistributedFixture test and documentation

* last tests

* fixes for refactored tests

* remove subdirs in workflow files

* fix pytest syntax error

* fix another syntax error

* update imports

* use DistFixture with elastic checkpoint test

* missing import

* update to shared class tmpdir for elastic test

* moved test files

* avoid duplicate test file name

* last refactor and moving test files

* formatting

* fix broken import

* testing forked AMD tests

* update abstract method

* use blob storage for accelerate and transformers tests

* upgrade torch for acclerate CI

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
molly-smith and others added 27 commits February 21, 2023 11:52
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* data efficiency library update

* data efficiency library update

* data efficiency update

* data efficiency update
* Make z3 respect comm dtype

* Support fp32 comm dtype

* Remove obsolete assert

* Code cleanup
* Modify table for compatible web format

* Add tutorial links to navigation

* Add news bit to main readme

* Update docs/_tutorials/automatic-tensor-parallelism.md

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* Check device count before running dist tests

* fixing format for "Check device count before running dist tests"

* Check device count against max world size

* Check GPU count before launching dist tests

* double-check GPU actually exists

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* Remove deprecated `torch._six` imports

Closes microsoft#2845.

* Support older versions of PyTorch as well.

---------

Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* Enable tensor fragments for zero 2

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Support offload

* Support multi-gpu

* Cleanup

* WIP

* Update deepspeed/runtime/zero/stage3.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Support padding

* Update deepspeed/runtime/zero/stage3.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* z3 optimizer state support; aligned api

* Support frozen z3 params

* Unit tests

* Check NVMe offload capability

* Formatting

* Docs

* More docs

* More docs

* Update docs/code-docs/source/zero3.rst

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* More docs

* Update docs/code-docs/source/zero3.rst

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* More docs

* More docs

* Update docs/code-docs/source/zero3.rst

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* More docs

* Support unsharded fp32 grad

* Remove debug prints

* Fix off-by-one detection of empty grads

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/runtime/zero/stage3.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix off-by-one error

* Skip ranks with no gradient data

* Formatting

* Add license

* Fix license

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
This PR updates the replace_fn function when loading inference checkpoints. The container will now be passed to the load_model_with_checkpoint() so we can call load_params() from there. load_params() is also updated to access the variables in the policy.
* microsoft#1213: Fix CPUAdam for when `vendor_id_raw` is not provided

* formatting (yapf) fix

---------

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Updates `deepspeed/monitor/monitor.py`
to instantiate objects with correct configs

Relevant issue:
microsoft#2853

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
* MPICH support

* MPICH changes

* MPICH changes

* MPICH changes

* MPICH changes

* accelerator runtime modifications

* Accelerator runtime changes

* Accelerator runtime modifications

* Remove redundant print from single node

* Move hostfile to tmp

* Code cleanup for MPICH class

* Code cleanup, rm whitespace

* Removing mpiexec environment check details

* Not needed tmp hostfile as pass directly

* Remove debugging comments

* rm print statement

* Revert comm changes as WA not needed

* Use MPICHRunner name for class

* Use MPICHRunner as class name

* No need to use args.force_multi and args.launcher .

This should be set in deepspeedexamples gpt-3.6b .sh script as:
$launcher=MPICH
run_cmd=" deepspeed  --hostfile=${hostfile_ds}  --num_nodes ${NUM_WORKERS} --num_gpus ${NUM_GPUS_PER_WORKER} --launcher=${launcher} --force_multi pretrain_gpt2.py $@ ${gpt_options}"

* Adhere to code pattern

* Rm empty lines in MPICHRunner class

* Uncomment check for num nodes and workers when used hostfile_deepspeed in gpt-3.6b.sh

* pass MPICH hostfile through launcher_args in gpt-3.6b.sh

* Clean code and remove args hostfile

* fix merge

* fix merge

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>

* clean up and fix format

* add ut

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
* check kernel injection supported models

* Clarify why user should use kernel injection
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
…icrosoft#2221)

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Rajhans Samdani <rajhans@gmail.com>
…f op_builder (microsoft#2963)

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
@Quentin-Anthony Quentin-Anthony merged commit fdfb825 into main Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.