-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for init_meta_context, materialize_module #9920
Conversation
This is incredible work allowing users to not have to change their model definition, and hopefully in the future FSDP supports sharding directly from the meta device modules. (cc @myleott @blefaudeux @anj-s who may have some stuff to say about the API/future integration!) Should we wait for the code to be merged into PyTorch and available in the nightly build? EDIT: looping @jeffra and @tjruwase from the DeepSpeed team as well :) |
Adding @cbalioglu to the conversation. IMO, we definitely want this for Best, |
Lets chase up on @zou3519 comment before proceeding further with this PR! We shouldn't merge till we understand the edge cases that are causing unstability. Also I think it would be beneficial if we showed a use case of benefit (which I think @tchaton has with DeepSpeed but might need more testing based on instantiation times vs |
Looks good to me as an experimental API. Please consider me your point of contact for any customer feedback or issue specific to the parts you copied over from the PyTorch PR #66317. |
This reverts commit 454e93b.
What does this PR do?
Fixes #9375
This PR builds on top of pytorch/pytorch#66317. A code section will be dropped once merged to PyTorch.
The goal is no code change for the end users.
TODO:
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃