Skip to content

set the default to use set_to_none for clearing gradients in BF16 optimizer.#5434

Merged
loadams merged 8 commits intodeepspeedai:masterfrom
inkcherry:fix_5175_
Apr 23, 2024
Merged

set the default to use set_to_none for clearing gradients in BF16 optimizer.#5434
loadams merged 8 commits intodeepspeedai:masterfrom
inkcherry:fix_5175_

Conversation

@inkcherry
Copy link
Contributor

as discussed in #5175, set the default to use set_to_none for clearing gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

def clear_lp_grads(self):

# using zero_() fixed memory address for graph replay
set_to_none = set_to_none = False if self.graph_harvesting else True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set_to_none = set_to_none = False if self.graph_harvesting else True
set_to_none = False if self.graph_harvesting else True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and reverified

@loadams loadams enabled auto-merge April 22, 2024 22:29
@loadams loadams added this pull request to the merge queue Apr 22, 2024
Merged via the queue into deepspeedai:master with commit c66bc42 Apr 23, 2024
rraminen pushed a commit to ROCm/DeepSpeed that referenced this pull request May 9, 2024
…imizer. (deepspeedai#5434)

as discussed in deepspeedai#5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
umchand pushed a commit to umchand/DeepSpeed that referenced this pull request May 20, 2024
…imizer. (deepspeedai#5434)

as discussed in deepspeedai#5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
dbyoung18 pushed a commit to dbyoung18/DeepSpeed that referenced this pull request Jun 11, 2024
…imizer. (deepspeedai#5434)

as discussed in deepspeedai#5175, set the default to use set_to_none for clearing
gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants