-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arde/fsdp activation checkpointing #25771
Arde/fsdp activation checkpointing #25771
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Cc @pacman100 if you think this is relevant 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @arde171 for adding this. Please raise an error if both activation_checkpointing
in FSDP config and training arg gradient_checkpointing
are set to True
. The error should mention that both can't be set to True and to use FSDP's checkpointing logic when using FSDP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @arde171!
* add FSDP config option to enable activation-checkpointing * update docs * add checks and remove redundant code * fix formatting error
* add FSDP config option to enable activation-checkpointing * update docs * add checks and remove redundant code * fix formatting error
* add FSDP config option to enable activation-checkpointing * update docs * add checks and remove redundant code * fix formatting error
What does this PR do?
Currently, HF Trainer didn't support FSDP activation checkpointing. This PR provides support to FSDP activation checkpointing.
Please see the details about the FSDP activation checkpointing here.
I saw the improvement in training performance for the large LLM models (e.g., LLAMA 70B) with FSDP activation checkpointing as compared to the existing gradient_checkpointing option. It's easy to enable FSDP activation_checkpointing.
we just need to add
"activation_checkpointing": "True"
to enable the FSDPactivation_checkpointing
as shown in below examplefsdp_config.json
file.fsdp_config.json
Please see the below PR for more details about FSDP activation checkpointing in accelerate repo:
PR: huggingface/accelerate#1891
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.