Pipe rpc #156

froody · 2020-10-21T00:07:47Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Adds support for:

Reused layers (e.g. for weight sharing)
Lazily-constructed layers
Single-process control via PipeRPCWrapper

Also lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

msbaines

Looks good. Mostly nits.

benchmarks/pipe.py

tests/nn/model_parallel/commons.py

benchmarks/pipe.py

tests/nn/model_parallel/commons.py

tests/nn/model_parallel/test_layers.py

tests/nn/moe/test_moe_layer.py

msbaines

If possible it would be nice if this could be broken up. Its hard to review all at once. It would also be useful to have a description of the design in one of the files as a comment or docstring.

benchmarks/pipe.py

fairscale/nn/model_parallel/mappings.py

fairscale/nn/model_parallel/initialize.py

msbaines · 2020-10-21T20:02:25Z

fairscale/nn/pipe/messages.py

+from .types import MESSAGE_GENERATION_START, InputDevice, PipeMessage, Tensors, TransportConfig
+
+# FIXME Why is 256 ok for training but not for tests?
+MESSAGE_TENSOR_SIZE = 1024  # 256


Why does this need to be fixed-size?

Because send/recv have to agree on the size, so the two options are send the size first, or just pre-allocate a large enough buffer for all cases, so I chose the latter

fairscale/nn/pipe/messages.py

msbaines · 2020-10-22T19:13:41Z

Added @blefaudeux to review fairscale.utils changs.

blefaudeux · 2020-10-23T04:16:40Z

fairscale/optim/utils.py

-        data = bytearray(buffer.getbuffer())
-        length_tensor = torch.LongTensor([len(data)]).to(dist_device)
-        data_send_tensor = torch.ByteTensor(data).to(dist_device)
+        data_send_tensor = pyobject_to_tensor(obj).to(dist_device)


ah interesting, I didn't know of this way ! Could you elaborate on the benefits vs. BytesIO ? cc @mannatsingh, because this code comes from Classy initially. Else if that works better in any way that's fine by me, the code is a little cleaner I think

Ah yeah this looks like a function within fairsclae which does something similar to what we do in classy, but through pickle and numpy (

fairscale/fairscale/nn/pipe/pipeline.py

Line 108 in 63f7796

def pyobject_to_tensor(obj: Any) -> Tensor:

). Although I don't think we're supposed to pickle tensors. So if obj contains tensors this might become problematic?

Ok, I'm going to revert my changes to fairscale.optim

Just mentioned this to @blefaudeux over chat - I think the only issue with pickling tensors is the device handling bit. But I'm not too familiar with this, I've just seen recommendations in a few places for using torch.load and torch.save. See https://discuss.pytorch.org/t/save-and-load-model/6206/28.

So I guess my statement isn't exactly true - "I don't think we're supposed to pickle tensors". If the code works with both GPUs and CPUs, there's no need to get rid of that. Just something I haven't tried :)

sidgoyal78

Thanks for the PR. Looks great. The modules for async (from this PR) are already being used for ampnet/xpipe.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2020

froody requested a review from msbaines October 21, 2020 00:08

froody marked this pull request as draft October 21, 2020 00:08

msbaines reviewed Oct 21, 2020

View reviewed changes

froody force-pushed the pipe-rpc branch from 0937916 to 4d3ac7b Compare October 21, 2020 18:32

msbaines reviewed Oct 21, 2020

View reviewed changes

msbaines requested a review from blefaudeux October 22, 2020 19:13

blefaudeux reviewed Oct 23, 2020

View reviewed changes

SeanNaren mentioned this pull request Oct 30, 2020

How to use Pipe in evaluation mode #173

Closed

froody force-pushed the pipe-rpc branch 3 times, most recently from e117516 to 4618a37 Compare November 6, 2020 18:47

Tom Birch added 7 commits November 9, 2020 13:41

initialize-fixes

bccb2f7

Add AsyncSchedule method

e9bbbf9

fixes per review comments

699a15a

refactor message

cedb218

cleanup

2d07591

more review comments

151263e

more cleanup/comments

e240d90

froody force-pushed the pipe-rpc branch 2 times, most recently from 39202a6 to 7dba4e5 Compare November 9, 2020 23:14

Add mp and rpc pipe tutorials

94bb946

froody force-pushed the pipe-rpc branch from 7dba4e5 to 94bb946 Compare November 10, 2020 00:42

SeanNaren mentioned this pull request Nov 10, 2020

self-balancing architecture Lightning-AI/pytorch-lightning#50

Closed

msbaines requested a review from sidgoyal78 November 10, 2020 18:32

froody marked this pull request as ready for review November 10, 2020 18:36

msbaines approved these changes Nov 10, 2020

View reviewed changes

sidgoyal78 reviewed Nov 10, 2020

View reviewed changes

sidgoyal78 approved these changes Nov 10, 2020

View reviewed changes

froody merged commit 5d4f50f into master Nov 10, 2020

froody deleted the pipe-rpc branch November 10, 2020 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipe rpc #156

Pipe rpc #156

froody commented Oct 21, 2020 •

edited

Loading

msbaines left a comment

msbaines left a comment

msbaines Oct 21, 2020

froody Oct 22, 2020

msbaines commented Oct 22, 2020

blefaudeux Oct 23, 2020

mannatsingh Oct 26, 2020

froody Oct 26, 2020

mannatsingh Oct 26, 2020 •

edited

Loading

sidgoyal78 left a comment •

edited

Loading

Pipe rpc #156

Pipe rpc #156

Conversation

froody commented Oct 21, 2020 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

msbaines left a comment

Choose a reason for hiding this comment

msbaines left a comment

Choose a reason for hiding this comment

msbaines Oct 21, 2020

Choose a reason for hiding this comment

froody Oct 22, 2020

Choose a reason for hiding this comment

msbaines commented Oct 22, 2020

blefaudeux Oct 23, 2020

Choose a reason for hiding this comment

mannatsingh Oct 26, 2020

Choose a reason for hiding this comment

froody Oct 26, 2020

Choose a reason for hiding this comment

mannatsingh Oct 26, 2020 • edited Loading

Choose a reason for hiding this comment

sidgoyal78 left a comment • edited Loading

Choose a reason for hiding this comment

froody commented Oct 21, 2020 •

edited

Loading

mannatsingh Oct 26, 2020 •

edited

Loading

sidgoyal78 left a comment •

edited

Loading