Make clearer about zero_init requirements #29879

muellerzr · 2024-03-26T16:18:45Z

What does this PR do?

@abhishekkrthakur pointed out that he was facing some confusing issues on when trying to use zero-init + model initialization, and he was initializing the model before TrainingArguments were made, so zero-init couldn't be setup.

There's no real way to catch this, so instead it flags beforehand that we need to instantiate the TrainingArguments beforehand in the docstring. The exception of this is if a user used a configured accelerate launch, which this PR also checks for/has logic for

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@pacman100 @amyeroberts

abhishekkrthakur · 2024-03-26T16:20:00Z

Thank you for adding this. It will hopefully save a lot of people a lot of time :)

muellerzr · 2024-03-26T16:20:46Z

@pacman100 the other option I'm considering:

Could we realistically check for this from accelerate launch via environmental variables set for zero init? This way the catch works for people not using accelerate launch, but we can do it ourselves if not

(Whether we raise an explicit error, or doing it behind the scenes I'm iffy on, I'd rather raise an err)

HuggingFaceDocBuilderDev · 2024-03-26T16:38:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/integrations/deepspeed.py

amyeroberts

Thanks for adding! Having these error messages will definitely be useful!

Just a comment on the conditional tuple return in is_deepspeed_zero3_enabled

src/transformers/integrations/deepspeed.py

muellerzr · 2024-03-28T16:53:49Z

Thanks @amyeroberts, I've tweaked it so now it lives under its own new func, and we call both in order since it's only needed in two places

src/transformers/integrations/deepspeed.py

pacman100

Thank you @muellerzr for the fixes but I have left some overall comments.

src/transformers/integrations/deepspeed.py

muellerzr · 2024-04-02T13:11:53Z

@amyeroberts @pacman100 should be good for a last review. Post discussions offline it's best to just do an RTFM, since we can't reliably check beforehand and currently rely on having numerous examples instead

amyeroberts

Thanks!

src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Docstring to note about zero init * Check for accelerate * Change conditional return * Tweak * Add new accelerate-specific zero3 check * Fix import * Revert to RTFM * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Docstring to note about zero init

b47f8f1

muellerzr requested review from pacman100 and amyeroberts March 26, 2024 16:18

Check for accelerate

0234833

muellerzr changed the title ~~Docstring to note about zero init~~ Make clearer about zero_init requirements and add an early check Mar 26, 2024

Change conditional return

125cd16

muellerzr commented Mar 26, 2024

View reviewed changes

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved

Tweak

eb122c7

amyeroberts reviewed Mar 28, 2024

View reviewed changes

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved

Add new accelerate-specific zero3 check

e455bc6

muellerzr requested a review from amyeroberts March 28, 2024 16:53

amyeroberts reviewed Mar 28, 2024

View reviewed changes

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved

Fix import

93ff7e7

muellerzr requested a review from amyeroberts March 28, 2024 17:07

pacman100 reviewed Apr 1, 2024

View reviewed changes

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved

Revert to RTFM

5c2370b

muellerzr requested a review from pacman100 April 2, 2024 13:11

amyeroberts approved these changes Apr 3, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

Update src/transformers/modeling_utils.py

1740e29

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

muellerzr changed the title ~~Make clearer about zero_init requirements and add an early check~~ Make clearer about zero_init requirements Apr 3, 2024

muellerzr merged commit 863e256 into main Apr 3, 2024
8 checks passed

muellerzr deleted the muellerzr-deepspeed-description branch April 3, 2024 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make clearer about zero_init requirements #29879

Make clearer about zero_init requirements #29879

muellerzr commented Mar 26, 2024 •

edited

Loading

abhishekkrthakur commented Mar 26, 2024

muellerzr commented Mar 26, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 26, 2024

amyeroberts left a comment

muellerzr commented Mar 28, 2024

pacman100 left a comment

muellerzr commented Apr 2, 2024

amyeroberts left a comment

Make clearer about zero_init requirements #29879

Make clearer about zero_init requirements #29879

Conversation

muellerzr commented Mar 26, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

abhishekkrthakur commented Mar 26, 2024

muellerzr commented Mar 26, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Mar 26, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr commented Mar 28, 2024

pacman100 left a comment

Choose a reason for hiding this comment

muellerzr commented Apr 2, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr commented Mar 26, 2024 •

edited

Loading

muellerzr commented Mar 26, 2024 •

edited

Loading