Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make clearer about zero_init requirements #29879

Merged
merged 8 commits into from
Apr 3, 2024

Conversation

muellerzr
Copy link
Contributor

@muellerzr muellerzr commented Mar 26, 2024

What does this PR do?

@abhishekkrthakur pointed out that he was facing some confusing issues on when trying to use zero-init + model initialization, and he was initializing the model before TrainingArguments were made, so zero-init couldn't be setup.

There's no real way to catch this, so instead it flags beforehand that we need to instantiate the TrainingArguments beforehand in the docstring. The exception of this is if a user used a configured accelerate launch, which this PR also checks for/has logic for

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@pacman100 @amyeroberts

@abhishekkrthakur
Copy link
Member

Thank you for adding this. It will hopefully save a lot of people a lot of time :)

@muellerzr
Copy link
Contributor Author

muellerzr commented Mar 26, 2024

@pacman100 the other option I'm considering:

Could we realistically check for this from accelerate launch via environmental variables set for zero init? This way the catch works for people not using accelerate launch, but we can do it ourselves if not

(Whether we raise an explicit error, or doing it behind the scenes I'm iffy on, I'd rather raise an err)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@muellerzr muellerzr changed the title Docstring to note about zero init Make clearer about zero_init requirements and add an early check Mar 26, 2024
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding! Having these error messages will definitely be useful!

Just a comment on the conditional tuple return in is_deepspeed_zero3_enabled

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved
@muellerzr
Copy link
Contributor Author

Thanks @amyeroberts, I've tweaked it so now it lives under its own new func, and we call both in order since it's only needed in two places

@muellerzr muellerzr requested a review from amyeroberts March 28, 2024 16:53
@muellerzr muellerzr requested a review from amyeroberts March 28, 2024 17:07
Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @muellerzr for the fixes but I have left some overall comments.

src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved
src/transformers/integrations/deepspeed.py Outdated Show resolved Hide resolved
@muellerzr muellerzr requested a review from pacman100 April 2, 2024 13:11
@muellerzr
Copy link
Contributor Author

@amyeroberts @pacman100 should be good for a last review. Post discussions offline it's best to just do an RTFM, since we can't reliably check beforehand and currently rely on having numerous examples instead

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@muellerzr muellerzr changed the title Make clearer about zero_init requirements and add an early check Make clearer about zero_init requirements Apr 3, 2024
@muellerzr muellerzr merged commit 863e256 into main Apr 3, 2024
8 checks passed
@muellerzr muellerzr deleted the muellerzr-deepspeed-description branch April 3, 2024 17:37
ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024
* Docstring to note about zero init

* Check for accelerate

* Change conditional return

* Tweak

* Add new accelerate-specific zero3 check

* Fix import

* Revert to RTFM

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
itazap pushed a commit that referenced this pull request May 14, 2024
* Docstring to note about zero init

* Check for accelerate

* Change conditional return

* Tweak

* Add new accelerate-specific zero3 check

* Fix import

* Revert to RTFM

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants