-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipe rpc #156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Mostly nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible it would be nice if this could be broken up. Its hard to review all at once. It would also be useful to have a description of the design in one of the files as a comment or docstring.
fairscale/nn/pipe/messages.py
Outdated
from .types import MESSAGE_GENERATION_START, InputDevice, PipeMessage, Tensors, TransportConfig | ||
|
||
# FIXME Why is 256 ok for training but not for tests? | ||
MESSAGE_TENSOR_SIZE = 1024 # 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be fixed-size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because send/recv have to agree on the size, so the two options are send the size first, or just pre-allocate a large enough buffer for all cases, so I chose the latter
Added @blefaudeux to review fairscale.utils changs. |
fairscale/optim/utils.py
Outdated
data = bytearray(buffer.getbuffer()) | ||
length_tensor = torch.LongTensor([len(data)]).to(dist_device) | ||
data_send_tensor = torch.ByteTensor(data).to(dist_device) | ||
data_send_tensor = pyobject_to_tensor(obj).to(dist_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah interesting, I didn't know of this way ! Could you elaborate on the benefits vs. BytesIO ? cc @mannatsingh, because this code comes from Classy initially. Else if that works better in any way that's fine by me, the code is a little cleaner I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah this looks like a function within fairsclae which does something similar to what we do in classy, but through pickle and numpy (
fairscale/fairscale/nn/pipe/pipeline.py
Line 108 in 63f7796
def pyobject_to_tensor(obj: Any) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm going to revert my changes to fairscale.optim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just mentioned this to @blefaudeux over chat - I think the only issue with pickling tensors is the device handling bit. But I'm not too familiar with this, I've just seen recommendations in a few places for using torch.load and torch.save. See https://discuss.pytorch.org/t/save-and-load-model/6206/28.
So I guess my statement isn't exactly true - "I don't think we're supposed to pickle tensors". If the code works with both GPUs and CPUs, there's no need to get rid of that. Just something I haven't tried :)
e117516
to
4618a37
Compare
39202a6
to
7dba4e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks great. The modules for async (from this PR) are already being used for ampnet/xpipe.
Before submitting
What does this PR do?
Adds support for:
Also lays the foundation for asynchronous pipeline work by introducing an event loop for each rank/worker to process either activations or gradients as they arrive.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃