[3D-parallelism] Hybrid Model Parallelism #32074

JZ-LIANG · 2021-04-05T13:57:00Z

PR types

New features

PR changes

APIs

Describe

new features
- Hybrid Model Parallelism：
  - Combine 3 single Model parallelism strategies (megatron, sharding, pipline) together as a Hybrid Parallelism strategy
  - Uniform switch to turn on/off each single parallelism strategy (temporally use Sharding & Pipeline as the uniform API)
performance optimization：
- speed
  - remove potential un-necessary sync_calc & sync_calc in hybrid model parallelism
- Memory usage
  - optimizer offload
  - optimizer temporary vars inplacement [commit withdraw, update in next pr]
performance-related
- the order of parallelism from inner to outer is : mp --> sharidng --> pp
- mp (megatron) and sharding parallelism will introduce large communication and is recommend: to be used within node. (mp_degree * sharding_degree = number of gpu per node)
- pp parallelism have less communication load compared with above two, which make them more suitable to use across the nodes (pp_degree = number node)

example

assume we have 4 nodes with 8 gpus per node:
mp-sharding-pp 3D parallelism

        dist_strategy.sharding = True
        dist_strategy.pipeline = True
        dist_strategy.sharding_configs = {"segment_broadcast_MB": 32,
                                            "pp_degree": 4,
                                            "sharding_degree":4,
                                            "mp_degree": 2,
                                            "optimize_offload": True,
                                            }
        dist_strategy.pipeline_configs = {"schedule_mode": "1F1B",
                                            "micro_batch_size": 1,
                                            "accumulate_steps": 4,
                                            }

mp-pp 2D parallelism

        dist_strategy.sharding = True
        dist_strategy.pipeline = True
        dist_strategy.sharding_configs = {"segment_broadcast_MB": 32,
                                            "pp_degree": 4,
                                            "sharding_degree":1,
                                            "mp_degree": 8,
                                            "optimize_offload": True,
                                            }
        dist_strategy.pipeline_configs = {"schedule_mode": "1F1B",
                                            "micro_batch_size": 1,
                                            "accumulate_steps": 4,
                                            }

shardign-pp 2D parallelism

        dist_strategy.sharding = True
        dist_strategy.pipeline = True
        dist_strategy.sharding_configs = {"segment_broadcast_MB": 32,
                                            "pp_degree": 1,
                                            "sharding_degree":4,
                                            "mp_degree": 8,
                                            "optimize_offload": True,
                                            }
        dist_strategy.pipeline_configs = {"schedule_mode": "1F1B",
                                            "micro_batch_size": 1,
                                            "accumulate_steps": 4,
                                            }

mp-sharding 2D parallelism

        dist_strategy.sharding = True
        dist_strategy.sharding_configs = {"segment_broadcast_MB": 32,
                                            "pp_degree": 1,
                                            "sharding_degree":4,
                                            "mp_degree": 8,
                                            "optimize_offload": False,
                                            "gradient_merge_acc_step": 4,
                                            }

paddle-bot-old · 2021-04-05T13:57:03Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot-old · 2021-04-05T13:57:05Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

wangxicoding · 2021-04-06T12:42:40Z

paddle/fluid/framework/distributed_strategy.proto

-  optional int32 sharding_degree = 3 [ default = 8 ];
-  optional int32 mp_degree = 4 [ default = 1 ];
-  optional string sharding_segment_strategy = 5
+  optional string sharding_segment_strategy = 1


Enum comments

recorded, document will be added in fluiddoc and fleetx

Also need add comments to this code.

wangxicoding · 2021-04-06T12:49:36Z

paddle/fluid/framework/distributed_strategy.proto

+  optional bool hybrid_dp = 7 [ default = false ];
+  optional int32 gradient_merge_acc_step = 8 [ default = 1 ];
+  optional bool optimize_offload = 9 [ default = false ];
+  optional bool pp_allreduce_in_optimize = 10 [ default = false ];


Add some comments, in 3d or 4d parallel, allreduce_in_optimize=True can reduce communication, allreduce_in_optimize=False can reduce memory

recorded, document will be added in fluiddoc and fleetx and .py file where the feature is called.

but I think this should be a feature for internal project now, and we should not expose It to users ?

zhiqiu

LGTM for backward.py

wangxicoding

LGTM

JZ-LIANG and others added 11 commits April 5, 2021 14:16

3D-sharding: revise config api

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

9e76409

adjust recompute for pp [sandyhouse]

1145089

GlobalNormClip use inplace mul

0c36b59

fix var in c_reduce_sum

86660d4

sharding update import

ec1040e

sharding optimizer offload

de68776

sharding revise offload

b77cafb

offload optimize fp32 param

fa0b181

fix

f4fc44d

add two modes for hybrid-dp&gradientmerge in mixed parallelism

ab6c51d

uniform switch for 3D parallelism

88523fa

JZ-LIANG changed the title ~~[3D-parallelism] Parallelism Switch~~ [3D-parallelism] Hybrid Model Parallelism Apr 6, 2021

JZ-LIANG added 6 commits April 6, 2021 11:27

3D parallelism bugfix

b5b7ef9

3D parallelism fixed sendrecv hang [sandyhouse]

e65fdcb

3D parallelism: naive support sharding & pipelien

0e0b341

sharding fixed bug in loss_grad scale

c15f53a

3D parallelism: fix bug in comm init sync

f3ebcc6

3D parallelism: revise memory optimization in clip

efc5e84

wangxicoding reviewed Apr 6, 2021

View reviewed changes

3D parallelism: update sharding unitest

0c28209

fuyinno4 approved these changes Apr 7, 2021

View reviewed changes

zhiqiu approved these changes Apr 7, 2021

View reviewed changes

wangxicoding approved these changes Apr 7, 2021

View reviewed changes

raindrops2sea approved these changes Apr 7, 2021

View reviewed changes

wangxicoding merged commit 1e60a0c into PaddlePaddle:develop Apr 7, 2021

JZ-LIANG mentioned this pull request Nov 10, 2022

[AutoParallel] bugfixed for FP16 if cond #47841

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3D-parallelism] Hybrid Model Parallelism #32074

[3D-parallelism] Hybrid Model Parallelism #32074

JZ-LIANG commented Apr 5, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 5, 2021

paddle-bot-old bot commented Apr 5, 2021 •

edited

Loading

wangxicoding Apr 6, 2021

JZ-LIANG Apr 7, 2021

wangxicoding Apr 7, 2021

wangxicoding Apr 6, 2021

JZ-LIANG Apr 7, 2021

zhiqiu left a comment

wangxicoding left a comment

[3D-parallelism] Hybrid Model Parallelism #32074

[3D-parallelism] Hybrid Model Parallelism #32074

Conversation

JZ-LIANG commented Apr 5, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 5, 2021

paddle-bot-old bot commented Apr 5, 2021 • edited Loading

wangxicoding Apr 6, 2021

Choose a reason for hiding this comment

JZ-LIANG Apr 7, 2021

Choose a reason for hiding this comment

wangxicoding Apr 7, 2021

Choose a reason for hiding this comment

wangxicoding Apr 6, 2021

Choose a reason for hiding this comment

JZ-LIANG Apr 7, 2021

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

wangxicoding left a comment

Choose a reason for hiding this comment

JZ-LIANG commented Apr 5, 2021 •

edited

Loading

paddle-bot-old bot commented Apr 5, 2021 •

edited

Loading