Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

Closed
stas00 opened this issue Feb 1, 2021 · 4 comments
Closed

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

stas00 opened this issue Feb 1, 2021 · 4 comments
Assignees
Labels
Pipeline Pipeline parallelism question Further information is requested

Comments

@stas00
Copy link
Contributor

stas00 commented Feb 1, 2021

❓ Questions and Help

I can't find any documents/examples that show how one can combine PP + ZeRO-DP (sharded) in fairscale.

Would you kindly share any working examples?

The very first task could be a 4 gpu setup that runs PP+ZeRO-DP, for example in this formation:

      pp
dp0 [0, 1]
dp1 [2, 3] 

So here there are 2 pipelines: 0-1, and 2-3, and Sharded DDP sees gpus 0 and 2 as the entry points.

Even a toy example based on your existing pp tutorial would be helpful.

I did find your MPU https://github.com/facebookresearch/fairscale/blob/master/fairscale/nn/model_parallel/initialize.py#L41
What's the next step?

Thank you!

@msbaines

@msbaines
Copy link
Contributor

msbaines commented Feb 2, 2021

I did find your MPU https://github.com/facebookresearch/fairscale/blob/master/fairscale/nn/model_parallel/initialize.py#L41
What's the next step?

Ignore this code for now. It is experimental.

You should be able to use pipe as described in Readme.md in addition to ZeRO. Our default Pipe is restricted to a single server with multiple GPUs. You can use is it with ZeRO by running a single PyTorch process per multi-GPU server and then running ZeRO/DDP across servers.

@stas00
Copy link
Contributor Author

stas00 commented Feb 2, 2021

Thank you, @msbaines!

Our default Pipe is restricted to a single server with multiple GPUs

You mean "single process with multiple GPUs", correct?

otherwise I'm not sure what you mean by multi-GPU server -they all are multi-GPU servers to start with ;)

@msbaines
Copy link
Contributor

msbaines commented Feb 2, 2021

Thank you, @msbaines!

Our default Pipe is restricted to a single server with multiple GPUs

You mean "single process with multiple GPUs", correct?

Yes. A single process handling the multiple GPUs of the pipeline. We use separate processes for each DDP/ZeRO instance.

So in your example:

      pp
dp0 [0, 1]
dp1 [2, 3] 

dp0 is a one process and dp1 is a second process

otherwise I'm not sure what you mean by multi-GPU server -they all are multi-GPU servers to start with ;)

@min-xu-ai min-xu-ai removed their assignment Apr 16, 2021
@anj-s anj-s added the Pipeline Pipeline parallelism label Oct 18, 2021
@anj-s
Copy link
Contributor

anj-s commented Oct 18, 2021

Closing this since the question has been answered.

@anj-s anj-s closed this as completed Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pipeline Pipeline parallelism question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants