Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

Closed
wants to merge 3 commits into from

Conversation

thomwolf
Copy link
Member

@thomwolf thomwolf commented Apr 1, 2020

This PR adds:

  • a get_block_list() utility method which returns a list of the blocks in a Transformers model (currently only added on T5). Block can be Modules or list/tuple of Modules (if a single transformer block is spread in several ModuleList like in XLM).
  • a spread_on_devices(devices: Optional[List] = None) method to spread a model on several devices by spreading the transformers blocks (roughly) evenly on the provided device list or all visible CUDA devices if no device list is given. The first device will host the remaining non-block modules in addition (the embeddings usually).

Currently, the code is in the T5 model but should be generic enough to be applied to other models if needed.

To use:

model = T5ForConditionalGeneration.from_pretrained('...')
model.spread_on_devices()  # Will spread on all visible CUDA devices by default
input = torch.tensor([...]).to('cuda:0')  # Inputs and outputs are on the first device
model(input)  # you should probably use only positional arguments for the forward pass (see spread_on_devices's docstring)

TODO:

  • try it
  • add tests if possible (on a dummy device list like ['cpu', 'cpu']?)

cc @patrickvonplaten @craffel

@thomwolf thomwolf changed the title [WIP] Adding model parrallelism for T5 (should work for other models as well) [WIP] Adding model parallelism for T5 (should work for other models as well) Apr 1, 2020
@codecov-io

This comment has been minimized.

@stale
Copy link

stale bot commented Aug 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 2, 2020
This was referenced Aug 8, 2020
@stale stale bot closed this Aug 9, 2020
@stale stale bot removed the wontfix label Aug 21, 2020
@exelents
Copy link

Hello! Do you have plans to merge this feature to master branch?
I tried to make it locally in clonned repo but I got an error while tried to use it:
<ipython

-input-22-5591bd8e45c0> in main()
143 cache_dir=model_args.cache_dir,
144 )
--> 145 model = model.spread_on_devices(['cpu', 'cpu'])
146
147 # Get datasets

/usr/local/lib/python3.6/dist-packages/transformers/modeling_t5.py in spread_on_devices(self, devices)
936 return
937
--> 938 modules_to_move = set(self.modules)
939
940 # Evenly spread the blocks on devices

TypeError: 'method' object is not iterable

@patrickvonplaten
Copy link
Contributor

Hey @exelents,

At the moment I don't think anybody is working on it and I'm not sure what the importance of this PR is at the moment. Feel free to take over the PR and try to make it work. I would be more than happy to help you if you open a PR :-)

@patrickvonplaten
Copy link
Contributor

This is very much related: #7526

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Apr 26, 2021
@LysandreJik LysandreJik deleted the model-parallelism branch April 27, 2022 15:52
@JoaoLages
Copy link
Contributor

This feature was awesome! I think this would be a major improvement to the transformers package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants