[`core`] Fix DeepSpeed zero-3 issue #182

younesbelkada · 2023-02-28T16:43:25Z

What does this PR do?

This is an attempt to fix #171

For now the sentiment script hangs, so I need to investigate

HuggingFaceDocBuilderDev · 2023-02-28T16:46:34Z

The documentation is not available anymore as the PR was closed or merged.

pacman100

Thank you @younesbelkada for fixing TRL+DS integration. Left comment. Sentiment pipeline related changes have shared offline.

pacman100 · 2023-03-27T09:58:03Z

trl/trainer/ppo_trainer.py

-        ) = self.accelerator.prepare(
-            self.model, self.ref_model, self.optimizer, self.data_collator, self.dataloader, self.lr_scheduler
+        # Safety checkers for DS integration
+        is_deepspeed_zero_3 = (


This changes is irrespective of DS Stage, it should be applied for all DS Stages

trl/trainer/ppo_trainer.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

younesbelkada · 2023-03-27T15:21:42Z

trl/trainer/ppo_trainer.py

+            if self.accelerator.state.deepspeed_plugin.zero_stage == 3:
+                self.model.train()


Based on the offline discussion I had with @pacman100 , I confirm this hack is needed to make DS3 work

lvwerra

One small comment, otherwise looks good!

trl/trainer/ppo_trainer.py

fix zero-3 issue

1909aa7

pacman100 reviewed Mar 27, 2023

View reviewed changes

younesbelkada and others added 6 commits March 27, 2023 10:12

Merge remote-tracking branch 'origin/main' into HEAD

6ac2bce

Update trl/trainer/ppo_trainer.py

fbbe9eb

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

adapt

bd06f78

make style

bb13034

fix

71424b4

add docs

f87dd85

younesbelkada commented Mar 27, 2023

View reviewed changes

younesbelkada requested review from pacman100 and lvwerra and removed request for pacman100 March 27, 2023 15:21

lvwerra approved these changes Mar 28, 2023

View reviewed changes

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved

fix

59e6d5d

younesbelkada merged commit 2672a94 into huggingface:main Mar 28, 2023

younesbelkada deleted the ds-fix-issue branch March 28, 2023 11:44

younesbelkada mentioned this pull request May 10, 2023

BUG: Deepspeed doesn't work with PEFT integration #349

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`core`] Fix DeepSpeed zero-3 issue #182

[`core`] Fix DeepSpeed zero-3 issue #182

younesbelkada commented Feb 28, 2023

HuggingFaceDocBuilderDev commented Feb 28, 2023 •

edited

Loading

pacman100 left a comment

pacman100 Mar 27, 2023

younesbelkada Mar 27, 2023 •

edited

Loading

lvwerra left a comment

		if self.accelerator.state.deepspeed_plugin.zero_stage == 3:
		self.model.train()

[core] Fix DeepSpeed zero-3 issue #182

[core] Fix DeepSpeed zero-3 issue #182

Conversation

younesbelkada commented Feb 28, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 28, 2023 • edited Loading

pacman100 left a comment

Choose a reason for hiding this comment

pacman100 Mar 27, 2023

Choose a reason for hiding this comment

younesbelkada Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

[`core`] Fix DeepSpeed zero-3 issue #182

[`core`] Fix DeepSpeed zero-3 issue #182

HuggingFaceDocBuilderDev commented Feb 28, 2023 •

edited

Loading

younesbelkada Mar 27, 2023 •

edited

Loading