-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to integrate 2D Parallelism: PP + ZeRO-DP? #351
Comments
Ignore this code for now. It is experimental. You should be able to use pipe as described in Readme.md in addition to ZeRO. Our default Pipe is restricted to a single server with multiple GPUs. You can use is it with ZeRO by running a single PyTorch process per multi-GPU server and then running ZeRO/DDP across servers. |
Thank you, @msbaines!
You mean "single process with multiple GPUs", correct? otherwise I'm not sure what you mean by multi-GPU server -they all are multi-GPU servers to start with ;) |
Yes. A single process handling the multiple GPUs of the pipeline. We use separate processes for each DDP/ZeRO instance. So in your example:
dp0 is a one process and dp1 is a second process
|
Closing this since the question has been answered. |
❓ Questions and Help
I can't find any documents/examples that show how one can combine PP + ZeRO-DP (sharded) in fairscale.
Would you kindly share any working examples?
The very first task could be a 4 gpu setup that runs PP+ZeRO-DP, for example in this formation:
So here there are 2 pipelines: 0-1, and 2-3, and Sharded DDP sees gpus 0 and 2 as the entry points.
Even a toy example based on your existing pp tutorial would be helpful.
I did find your MPU https://github.com/facebookresearch/fairscale/blob/master/fairscale/nn/model_parallel/initialize.py#L41
What's the next step?
Thank you!
@msbaines
The text was updated successfully, but these errors were encountered: