Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe rpc #156

Merged
merged 8 commits into from
Nov 10, 2020
Merged

Pipe rpc #156

merged 8 commits into from
Nov 10, 2020

Conversation

froody
Copy link
Contributor

@froody froody commented Oct 21, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Adds support for:

  • Reused layers (e.g. for weight sharing)
  • Lazily-constructed layers
  • Single-process control via PipeRPCWrapper

Also lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 21, 2020
@froody froody requested a review from msbaines October 21, 2020 00:08
@froody froody marked this pull request as draft October 21, 2020 00:08
Copy link
Contributor

@msbaines msbaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Mostly nits.

benchmarks/pipe.py Outdated Show resolved Hide resolved
tests/nn/model_parallel/commons.py Outdated Show resolved Hide resolved
benchmarks/pipe.py Outdated Show resolved Hide resolved
benchmarks/pipe.py Outdated Show resolved Hide resolved
tests/nn/model_parallel/commons.py Outdated Show resolved Hide resolved
tests/nn/model_parallel/test_layers.py Outdated Show resolved Hide resolved
tests/nn/moe/test_moe_layer.py Show resolved Hide resolved
Copy link
Contributor

@msbaines msbaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible it would be nice if this could be broken up. Its hard to review all at once. It would also be useful to have a description of the design in one of the files as a comment or docstring.

benchmarks/pipe.py Outdated Show resolved Hide resolved
fairscale/nn/model_parallel/mappings.py Outdated Show resolved Hide resolved
fairscale/nn/model_parallel/initialize.py Outdated Show resolved Hide resolved
from .types import MESSAGE_GENERATION_START, InputDevice, PipeMessage, Tensors, TransportConfig

# FIXME Why is 256 ok for training but not for tests?
MESSAGE_TENSOR_SIZE = 1024 # 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be fixed-size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because send/recv have to agree on the size, so the two options are send the size first, or just pre-allocate a large enough buffer for all cases, so I chose the latter

fairscale/nn/pipe/messages.py Outdated Show resolved Hide resolved
fairscale/nn/pipe/messages.py Outdated Show resolved Hide resolved
fairscale/nn/pipe/messages.py Outdated Show resolved Hide resolved
@msbaines msbaines requested a review from blefaudeux October 22, 2020 19:13
@msbaines
Copy link
Contributor

Added @blefaudeux to review fairscale.utils changs.

data = bytearray(buffer.getbuffer())
length_tensor = torch.LongTensor([len(data)]).to(dist_device)
data_send_tensor = torch.ByteTensor(data).to(dist_device)
data_send_tensor = pyobject_to_tensor(obj).to(dist_device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting, I didn't know of this way ! Could you elaborate on the benefits vs. BytesIO ? cc @mannatsingh, because this code comes from Classy initially. Else if that works better in any way that's fine by me, the code is a little cleaner I think

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah this looks like a function within fairsclae which does something similar to what we do in classy, but through pickle and numpy (

def pyobject_to_tensor(obj: Any) -> Tensor:
). Although I don't think we're supposed to pickle tensors. So if obj contains tensors this might become problematic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm going to revert my changes to fairscale.optim

Copy link

@mannatsingh mannatsingh Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioned this to @blefaudeux over chat - I think the only issue with pickling tensors is the device handling bit. But I'm not too familiar with this, I've just seen recommendations in a few places for using torch.load and torch.save. See https://discuss.pytorch.org/t/save-and-load-model/6206/28.

So I guess my statement isn't exactly true - "I don't think we're supposed to pickle tensors". If the code works with both GPUs and CPUs, there's no need to get rid of that. Just something I haven't tried :)

@froody froody force-pushed the pipe-rpc branch 3 times, most recently from e117516 to 4618a37 Compare November 6, 2020 18:47
@froody froody force-pushed the pipe-rpc branch 2 times, most recently from 39202a6 to 7dba4e5 Compare November 9, 2020 23:14
Copy link
Contributor

@sidgoyal78 sidgoyal78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks great. The modules for async (from this PR) are already being used for ampnet/xpipe.

@froody froody merged commit 5d4f50f into master Nov 10, 2020
@froody froody deleted the pipe-rpc branch November 10, 2020 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants