set the default to use set_to_none for clearing gradients in BF16 optimizer. by inkcherry · Pull Request #5434 · deepspeedai/DeepSpeed

inkcherry · 2024-04-18T13:47:01Z

as discussed in #5175, set the default to use set_to_none for clearing gradients in BF16 optimizer.
Additionally, for the case of zero clearing, use foreach_zero.
Verified correctness with mega-ds llama 7B training.

FYI @loadams

tjruwase · 2024-04-19T20:16:03Z

deepspeed/runtime/bf16_optimizer.py

    def clear_lp_grads(self):
+
+        # using zero_() fixed memory address for graph replay
+        set_to_none = set_to_none = False if self.graph_harvesting else True


Suggested change

set_to_none = set_to_none = False if self.graph_harvesting else True

set_to_none = False if self.graph_harvesting else True

fixed and reverified

…to fix_5175_

deepspeed/runtime/bf16_optimizer.py

@loadams

…imizer. (deepspeedai#5434) as discussed in deepspeedai#5175, set the default to use set_to_none for clearing gradients in BF16 optimizer. Additionally, for the case of zero clearing, use foreach_zero. Verified correctness with mega-ds llama 7B training. FYI @loadams --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

@loadams

…imizer. (deepspeedai#5434) as discussed in deepspeedai#5175, set the default to use set_to_none for clearing gradients in BF16 optimizer. Additionally, for the case of zero clearing, use foreach_zero. Verified correctness with mega-ds llama 7B training. FYI @loadams --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

@loadams

…imizer. (deepspeedai#5434) as discussed in deepspeedai#5175, set the default to use set_to_none for clearing gradients in BF16 optimizer. Additionally, for the case of zero clearing, use foreach_zero. Verified correctness with mega-ds llama 7B training. FYI @loadams --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

set clear_grad default to None, use foreach_zero

678a6e9

inkcherry requested review from mrwyattii and tjruwase as code owners April 18, 2024 13:47

inkcherry added 2 commits April 18, 2024 15:38

fix typo

6c568ac

Merge branch 'master' into fix_5175_

d1ec544

tjruwase reviewed Apr 19, 2024

View reviewed changes

inkcherry added 2 commits April 20, 2024 01:42

fix condition&add detach

6b77946

Merge branch 'fix_5175_' of https://github.com/inkcherry/DeepSpeed in…

7c9c8a9

…to fix_5175_

tjruwase reviewed Apr 20, 2024

View reviewed changes

deepspeed/runtime/bf16_optimizer.py Show resolved Hide resolved

Merge branch 'master' into fix_5175_

85f295d

tjruwase approved these changes Apr 22, 2024

View reviewed changes

loadams added 2 commits April 22, 2024 15:10

Merge branch 'master' into fix_5175_

32fc1ce

Merge branch 'master' into fix_5175_

617e28a

loadams enabled auto-merge April 22, 2024 22:29

loadams added this pull request to the merge queue Apr 22, 2024

Merged via the queue into deepspeedai:master with commit c66bc42 Apr 23, 2024

delock mentioned this pull request Sep 20, 2024

[TRACKER] Customer support related PR tracker for Intel devices #6556

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set the default to use set_to_none for clearing gradients in BF16 optimizer.#5434

set the default to use set_to_none for clearing gradients in BF16 optimizer.#5434
loadams merged 8 commits intodeepspeedai:masterfrom
inkcherry:fix_5175_

inkcherry commented Apr 18, 2024

Uh oh!

tjruwase Apr 19, 2024

Uh oh!

inkcherry Apr 20, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	set_to_none = set_to_none = False if self.graph_harvesting else True
	set_to_none = False if self.graph_harvesting else True

Conversation

inkcherry commented Apr 18, 2024

Uh oh!

tjruwase Apr 19, 2024

Choose a reason for hiding this comment

Uh oh!

inkcherry Apr 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants