[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

thomwolf · 2020-04-01T21:27:55Z

This PR adds:

a get_block_list() utility method which returns a list of the blocks in a Transformers model (currently only added on T5). Block can be Modules or list/tuple of Modules (if a single transformer block is spread in several ModuleList like in XLM).
a spread_on_devices(devices: Optional[List] = None) method to spread a model on several devices by spreading the transformers blocks (roughly) evenly on the provided device list or all visible CUDA devices if no device list is given. The first device will host the remaining non-block modules in addition (the embeddings usually).

Currently, the code is in the T5 model but should be generic enough to be applied to other models if needed.

To use:

model = T5ForConditionalGeneration.from_pretrained('...')
model.spread_on_devices()  # Will spread on all visible CUDA devices by default
input = torch.tensor([...]).to('cuda:0')  # Inputs and outputs are on the first device
model(input)  # you should probably use only positional arguments for the forward pass (see spread_on_devices's docstring)

TODO:

try it
add tests if possible (on a dummy device list like ['cpu', 'cpu']?)

cc @patrickvonplaten @craffel

Style and quality

stale · 2020-08-02T16:12:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

exelents · 2020-09-10T09:34:45Z

Hello! Do you have plans to merge this feature to master branch?
I tried to make it locally in clonned repo but I got an error while tried to use it:
<ipython

-input-22-5591bd8e45c0> in main()
143 cache_dir=model_args.cache_dir,
144 )
--> 145 model = model.spread_on_devices(['cpu', 'cpu'])
146
147 # Get datasets

/usr/local/lib/python3.6/dist-packages/transformers/modeling_t5.py in spread_on_devices(self, devices)
936 return
937
--> 938 modules_to_move = set(self.modules)
939
940 # Evenly spread the blocks on devices

TypeError: 'method' object is not iterable

patrickvonplaten · 2020-09-20T18:52:16Z

Hey @exelents,

At the moment I don't think anybody is working on it and I'm not sure what the importance of this PR is at the moment. Feel free to take over the PR and try to make it work. I would be more than happy to help you if you open a PR :-)

patrickvonplaten · 2020-10-06T21:14:42Z

This is very much related: #7526

github-actions · 2021-04-16T15:04:17Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

JoaoLages · 2022-05-10T17:31:09Z

This feature was awesome! I think this would be a major improvement to the transformers package!

thomwolf added 2 commits April 1, 2020 23:21

Adding spread_on_devices

426b510

Update modeling_t5.py

ffed6a8

Style and quality

thomwolf changed the title ~~[WIP] Adding model parrallelism for T5 (should work for other models as well)~~ [WIP] Adding model parallelism for T5 (should work for other models as well) Apr 1, 2020

This comment has been minimized.

Sign in to view

Less general but avoid hook issues

3bfeebe

patrickvonplaten self-assigned this Apr 2, 2020

patrickvonplaten mentioned this pull request Jun 3, 2020

Out of memory error while training GPT2-large on 8x32GB Nvidia Volta #3616

Closed

patrickvonplaten added the Distributed Training / Models label Jun 3, 2020

stale bot added the wontfix label Aug 2, 2020

This was referenced Aug 8, 2020

Model parallelism support? #1836

Closed

Bert Mesh Tensorflow #6266

Closed

stale bot closed this Aug 9, 2020

patrickvonplaten mentioned this pull request Aug 17, 2020

Can't load t5-11b from pre-trained #6517

Closed

patrickvonplaten reopened this Aug 21, 2020

stale bot removed the wontfix label Aug 21, 2020

exelents mentioned this pull request Sep 10, 2020

T5-11b model parallelism #7047

Closed

github-actions bot closed this Apr 26, 2021

LysandreJik deleted the model-parallelism branch April 27, 2022 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

thomwolf commented Apr 1, 2020 •

edited

Loading

This comment has been minimized.

stale bot commented Aug 2, 2020

exelents commented Sep 10, 2020

patrickvonplaten commented Sep 20, 2020

patrickvonplaten commented Oct 6, 2020

github-actions bot commented Apr 16, 2021

JoaoLages commented May 10, 2022

[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

[WIP] Adding model parallelism for T5 (should work for other models as well) #3578

Conversation

thomwolf commented Apr 1, 2020 • edited Loading

This comment has been minimized.

stale bot commented Aug 2, 2020

exelents commented Sep 10, 2020

patrickvonplaten commented Sep 20, 2020

patrickvonplaten commented Oct 6, 2020

github-actions bot commented Apr 16, 2021

JoaoLages commented May 10, 2022

thomwolf commented Apr 1, 2020 •

edited

Loading