-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Adding model parallelism for T5 (should work for other models as well) #3578
Conversation
Style and quality
This comment has been minimized.
This comment has been minimized.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello! Do you have plans to merge this feature to master branch?
|
Hey @exelents, At the moment I don't think anybody is working on it and I'm not sure what the importance of this PR is at the moment. Feel free to take over the PR and try to make it work. I would be more than happy to help you if you open a PR :-) |
This is very much related: #7526 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This feature was awesome! I think this would be a major improvement to the transformers package! |
This PR adds:
get_block_list()
utility method which returns a list of the blocks in a Transformers model (currently only added on T5). Block can be Modules or list/tuple of Modules (if a single transformer block is spread in several ModuleList like in XLM).spread_on_devices(devices: Optional[List] = None)
method to spread a model on several devices by spreading the transformers blocks (roughly) evenly on the provided device list or all visible CUDA devices if no device list is given. The first device will host the remaining non-block modules in addition (the embeddings usually).Currently, the code is in the T5 model but should be generic enough to be applied to other models if needed.
To use:
TODO:
cc @patrickvonplaten @craffel