How to integrate 2D Parallelism: PP + ZeRO-DP? #351

stas00 · 2021-02-01T22:15:57Z

❓ Questions and Help

I can't find any documents/examples that show how one can combine PP + ZeRO-DP (sharded) in fairscale.

Would you kindly share any working examples?

The very first task could be a 4 gpu setup that runs PP+ZeRO-DP, for example in this formation:

      pp
dp0 [0, 1]
dp1 [2, 3]

So here there are 2 pipelines: 0-1, and 2-3, and Sharded DDP sees gpus 0 and 2 as the entry points.

Even a toy example based on your existing pp tutorial would be helpful.

I did find your MPU https://github.com/facebookresearch/fairscale/blob/master/fairscale/nn/model_parallel/initialize.py#L41
What's the next step?

Thank you!

@msbaines

The text was updated successfully, but these errors were encountered:

msbaines · 2021-02-02T20:32:18Z

I did find your MPU https://github.com/facebookresearch/fairscale/blob/master/fairscale/nn/model_parallel/initialize.py#L41
What's the next step?

Ignore this code for now. It is experimental.

You should be able to use pipe as described in Readme.md in addition to ZeRO. Our default Pipe is restricted to a single server with multiple GPUs. You can use is it with ZeRO by running a single PyTorch process per multi-GPU server and then running ZeRO/DDP across servers.

stas00 · 2021-02-02T20:57:23Z

Thank you, @msbaines!

Our default Pipe is restricted to a single server with multiple GPUs

You mean "single process with multiple GPUs", correct?

otherwise I'm not sure what you mean by multi-GPU server -they all are multi-GPU servers to start with ;)

msbaines · 2021-02-02T21:24:22Z

Thank you, @msbaines!

Our default Pipe is restricted to a single server with multiple GPUs

You mean "single process with multiple GPUs", correct?

Yes. A single process handling the multiple GPUs of the pipeline. We use separate processes for each DDP/ZeRO instance.

So in your example:

      pp
dp0 [0, 1]
dp1 [2, 3]

dp0 is a one process and dp1 is a second process

otherwise I'm not sure what you mean by multi-GPU server -they all are multi-GPU servers to start with ;)

anj-s · 2021-10-18T13:13:36Z

Closing this since the question has been answered.

stas00 mentioned this issue Feb 1, 2021

[2D Parallelism] Tracking feasibility huggingface/transformers#9931

Open

20 tasks

msbaines assigned msbaines, min-xu-ai, anj-s and blefaudeux Feb 2, 2021

msbaines added the question Further information is requested label Feb 2, 2021

blefaudeux mentioned this issue Feb 17, 2021

[feat][ShardedDDP] Support multiple groups #394

Merged

4 tasks

min-xu-ai removed their assignment Apr 16, 2021

anj-s added the Pipeline Pipeline parallelism label Oct 18, 2021

anj-s closed this as completed Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

stas00 commented Feb 1, 2021

msbaines commented Feb 2, 2021

stas00 commented Feb 2, 2021

msbaines commented Feb 2, 2021 •

edited

Loading

anj-s commented Oct 18, 2021

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

How to integrate 2D Parallelism: PP + ZeRO-DP? #351

Comments

stas00 commented Feb 1, 2021

❓ Questions and Help

msbaines commented Feb 2, 2021

stas00 commented Feb 2, 2021

msbaines commented Feb 2, 2021 • edited Loading

anj-s commented Oct 18, 2021

msbaines commented Feb 2, 2021 •

edited

Loading