[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

GhostScreaming · 2023-07-26T07:11:55Z

问题描述 Please describe your issue

大家好，当前Fluid下分布式算子由于未完全迁移到新的PHI算子体系，无法具备PHI下函数式算子注册时"记录自身输入输出属性“的能力，在分布式场景也就无法使用框架新的通信模块和调度系统，给分布式训练调试、优化等工作带来较大的负担。我们一共收集了17个需要迁移的算子，欢迎大家提交PR一起对这些算子做迁移改造。

更多详细介绍见 Call-for-Contributions: Fluid算子函数式迁移专项，本issue用于跟踪记录该项目下各个算子的迁移改造进度。

注：

同一个算子的 cpu/gpu/xpu kernel 以及 InferShape 尽量由同一个同学进行迁移改造，在issue下方认领。
可参考前两期算子迁移工作：
- [PHI] Functionalization for Fluid kernel #52395
- Good first issue for phi kernel output definition #51292
该任务时间：PR 截止提交时间是9月15日，截止合入时间是9月25日。
PR 通过 CI 后，在Review Request @hitywt @GhostScreaming 进行审核。

待迁移算子列表（整体进度 15/16）

按 merge 的时间顺序，排名不分先后： @AndSonder (4) @GreatV (2) @BeingGod (3) @huangjiyi (2) @gouzil (2) @yangguohao (1) @zeroRains (1)

序号	算子名称	认领人	PR(cpu/gpu kernel )	PR(xpu kernel)	PR(InferShape)
1	c_embedding✅(2023/8/16)	@BeingGod	#56129	#56129	无需
2	dgc✅(2023/8/10)	@huangjiyi	#56003	无需	无需
3	dgc_momentum✅(2023/8/18)	@huangjiyi	#56158	无需	#56358
4	c_split✅(2023/8/22)	@BeingGod	#56327	#56327	无需
5	lars_momentum✅(2023/9/5)	@gouzil	#55798	#56751	#56749
6	nop✅(2023/8/3)	@gouzil	#55816	无需	无需
7	ftrl	@enkilee	#56270
8	decayed_adagrad✅(2023/8/5)	@GreatV	#55995	无需	#55995
9	c_identity✅(2023/8/23)	@GreatV	#56215	#56215	#56215
10	distributed_fused_lamb_init✅(2023/8/31)	@zeroRains	#55993	无需	无需
11	limit_by_capacity✅(2023/8/18)	@yangguohao	#55948	无需	无需
12	number_count✅(2023/8/15)	@BeingGod	#56128	无需	无需
13	~~distributed_fused_lamb~~	上期迁移过
14	random_routing✅(2023/8/1)	@AndSonder	#55773	无需	无需
15	prune_gate_by_capacity✅(2023/8/1)	@AndSonder	#55780	无需	无需
16	assign_pos✅(2023/8/16)	@AndSonder	#55794	无需	无需
17	fused_softmax_mask_upper_triangle✅(2023/8/1)	@AndSonder	#55769	无需	无需

GreatV · 2023-07-26T08:08:39Z

序号	算子名称	认领人	PR(cpu/gpu kernel )	PR(xpu kernel)	PR(InferShape)
8	decayed_adagrad	@GreatV
9	c_identity	@GreatV

BeingGod · 2023-07-26T08:32:37Z

序号	算子名称	认领人	PR(cpu/gpu kernel )
1	c_embedding	@BeingGod	#56129
4	c_split	@BeingGod	#56327
12	number_count	@BeingGod	#56128

AndSonder · 2023-07-26T08:58:25Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
14	random_routing	@AndSonder	#55773	-	-
15	prune_gate_by_capacity	@AndSonder	#55780	-	-
16	assign_pos	@AndSonder	#55794	-	-
17	fused_softmax_mask_upper_triangle	@AndSonder	#55769	-	-

gouzil · 2023-07-26T09:02:47Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
5	lars_momentum	@gouzil	#55798
6	nop	@gouzil	#55816	-	-

enkilee · 2023-07-26T09:04:08Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
7	ftrl	@enkilee

huangjiyi · 2023-07-26T09:28:57Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
2	dgc	@huangjiyi
3	dgc_momentum	@huangjiyi

yangguohao · 2023-07-26T12:14:36Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
11	limit_by_capacity	@yangguohao

AndSonder · 2023-07-28T03:35:08Z

@GhostScreaming fused_softmax_mask_upper_triangle 的任务， fused_softmax_mask_upper_triangle_grad 需要一并迁移吗？

GhostScreaming · 2023-07-31T02:00:59Z

@GhostScreaming fused_softmax_mask_upper_triangle 的任务， fused_softmax_mask_upper_triangle_grad 需要一并迁移吗？

嗯嗯，对应的反向算子也默认是需要迁移的。

zeroRains · 2023-08-03T12:51:58Z

序号	算子名称	认领人	PR(cpu/gpu kernel)	PR(xpu kernel)	PR(InferShape)
10	distributed_fused_lamb_init	@zeroRains	-	-	-

AndSonder · 2023-08-14T03:15:12Z

random_routing、prune_gate_by_capacity、assign_pos、fused_softmax_mask_upper_triangle 无需迁移 XPU，因为这些算子原本就没有 XPU 的Kernel。也无需迁移 InferShape，因为不涉及到undefined的注册数据类型，无需用InferMeta推导出 dtype 信息

GreatV · 2023-08-14T14:31:35Z

c_identity gpu kernel xpu kernel InferShape 都在PR move c_identity to phi #56215
decayed_adagrad gpu kernel InferShape 都在PR move decayed_adagrad_op to phi #55995 , xpu kernel 无需迁移

AndSonder · 2023-08-14T15:20:46Z

distributed_fused_lamb 这个算子在上一期中已经迁移过了可以划掉了 @luotao1

BeingGod · 2023-08-15T01:57:53Z

c_split InferShape无需迁移，xpu kernel在[Fluid] NO.4 Migrate c_split to PHI #56327 中
c_embedding InferShape无需迁移，xpu kernel在[Fluid] NO.1 Migrate c_embedding to PHI #56129 中
number_count xpu kernel，InferShape无需迁移

huangjiyi · 2023-08-16T10:24:04Z

dgc 和 dgc_momentum 都没有 xpu kernel，无需迁移
dgc 的 InferShape 只有检查输入输出是否存在，无需迁移
dgc_momentum InferShape 的迁移 PR：move dgc_momentum InferShape to phi #56358

zeroRains · 2023-08-16T11:22:36Z

distributed_fused_lamb_init算子原本就没有 XPU 的Kernel，无须迁移xpu。不涉及到undefined的注册数据类型，无需用InferMeta推导出 dtype 信息，无须迁移InferShape

gouzil · 2023-08-16T14:02:50Z

nop无需迁移 xpu, 也无需迁移 InferShape
lars_momentum 需要迁移 xpu, 也需要迁移 InferShape. 需要等待 [Fluid] move lars_momentum to phi #55798 合入

yangguohao · 2023-08-19T04:29:43Z

limit_by_capacity 无需迁移 XPU 和 InferShape

luotao1 · 2023-09-18T03:48:07Z

Fluid下分布式算子迁移已完成，感谢参与的小伙伴们！

按 merge 的时间顺序，排名不分先后： @AndSonder (4) @GreatV (2) @BeingGod (3) @huangjiyi (2) @gouzil (2) @yangguohao (1) @zeroRains (1)

欢迎继续参与快乐开源的其他任务！

GhostScreaming added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Jul 26, 2023

GhostScreaming self-assigned this Jul 26, 2023

paddle-bot bot assigned YanhuiDua Jul 26, 2023

luotao1 added this to Call for Contributions Jul 26, 2023

luotao1 moved this to In Progress in Call for Contributions Jul 26, 2023

luotao1 mentioned this issue Jul 26, 2023

中国软件开源创新大赛：飞桨框架任务挑战赛（下） #55663

Closed

luotao1 self-assigned this Jul 26, 2023

YanhuiDua removed their assignment Jul 26, 2023

AndSonder mentioned this issue Jul 26, 2023

Add some ops to static build list #55724

Closed

This was referenced Jul 26, 2023

Add some ops to static build list #55727

Merged

[Fluid] Move fused_softmax_mask_upper_triangle to phi #55769

Merged

This was referenced Jul 28, 2023

[Fluid] Move random routing to phi #55773

Merged

move prune_gate_by_capacity to phi #55780

Merged

[Fluid] move assign_pos to phi #55794

Merged

This was referenced Jul 29, 2023

[Fluid] move lars_momentum to phi #55798

Merged

[fluid] move NopKernel to phi #55816

Merged

GreatV mentioned this issue Aug 5, 2023

move decayed_adagrad_op to phi #55995

Merged

huangjiyi mentioned this issue Aug 6, 2023

move dgc kernel to phi #56003

Merged

This was referenced Aug 9, 2023

[Fluid] NO.12 Migrate number_count to PHI #56128

Merged

[Fluid] NO.1 Migrate c_embedding to PHI #56129

Merged

huangjiyi mentioned this issue Aug 10, 2023

move dgc_momentum kernel to phi #56158

Merged

AndSonder mentioned this issue Aug 12, 2023

静态图执行器预分析性能优化 #55299

Closed

GreatV mentioned this issue Aug 12, 2023

move c_identity to phi #56215

Merged

enkilee mentioned this issue Aug 14, 2023

[Fluid] move ftrl to phi #56270

Closed

BeingGod mentioned this issue Aug 15, 2023

[Fluid] NO.4 Migrate c_split to PHI #56327

Merged

huangjiyi mentioned this issue Aug 16, 2023

move dgc_momentum InferShape to phi #56358

Merged

This was referenced Aug 29, 2023

[Fluid] move lars_momentum_op InferShape to phi #56749

Merged

[Fluid] move lars_momentum_xpu to phi #56751

Merged

AndSonder mentioned this issue Sep 6, 2023

Fix distributed_fused_lamb_init error when open static_build flag #57021

Closed

luotao1 closed this as completed Sep 18, 2023

github-project-automation bot moved this from In Progress to Done in Call for Contributions Sep 18, 2023

gouzil mentioned this issue Oct 17, 2023

【社区治理】gouzil 发起 Committer 身份申请 #58151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

GhostScreaming commented Jul 26, 2023 •

edited by luotao1

Loading

GreatV commented Jul 26, 2023 •

edited

Loading

BeingGod commented Jul 26, 2023 •

edited

Loading

AndSonder commented Jul 26, 2023 •

edited

Loading

gouzil commented Jul 26, 2023 •

edited

Loading

enkilee commented Jul 26, 2023 •

edited

Loading

huangjiyi commented Jul 26, 2023

yangguohao commented Jul 26, 2023

AndSonder commented Jul 28, 2023

GhostScreaming commented Jul 31, 2023 •

edited

Loading

zeroRains commented Aug 3, 2023

AndSonder commented Aug 14, 2023

GreatV commented Aug 14, 2023 •

edited

Loading

AndSonder commented Aug 14, 2023

BeingGod commented Aug 15, 2023 •

edited

Loading

huangjiyi commented Aug 16, 2023

zeroRains commented Aug 16, 2023

gouzil commented Aug 16, 2023

yangguohao commented Aug 19, 2023

luotao1 commented Sep 18, 2023

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

[Fluid] Migrate Fluid Distributed Kernels to PHI #55716

Comments

GhostScreaming commented Jul 26, 2023 • edited by luotao1 Loading

问题描述 Please describe your issue

待迁移算子列表（整体进度 15/16）

GreatV commented Jul 26, 2023 • edited Loading

BeingGod commented Jul 26, 2023 • edited Loading

AndSonder commented Jul 26, 2023 • edited Loading

gouzil commented Jul 26, 2023 • edited Loading

enkilee commented Jul 26, 2023 • edited Loading

huangjiyi commented Jul 26, 2023

yangguohao commented Jul 26, 2023

AndSonder commented Jul 28, 2023

GhostScreaming commented Jul 31, 2023 • edited Loading

zeroRains commented Aug 3, 2023

AndSonder commented Aug 14, 2023

GreatV commented Aug 14, 2023 • edited Loading

AndSonder commented Aug 14, 2023

BeingGod commented Aug 15, 2023 • edited Loading

huangjiyi commented Aug 16, 2023

zeroRains commented Aug 16, 2023

gouzil commented Aug 16, 2023

yangguohao commented Aug 19, 2023

luotao1 commented Sep 18, 2023

GhostScreaming commented Jul 26, 2023 •

edited by luotao1

Loading

GreatV commented Jul 26, 2023 •

edited

Loading

BeingGod commented Jul 26, 2023 •

edited

Loading

AndSonder commented Jul 26, 2023 •

edited

Loading

gouzil commented Jul 26, 2023 •

edited

Loading

enkilee commented Jul 26, 2023 •

edited

Loading

GhostScreaming commented Jul 31, 2023 •

edited

Loading

GreatV commented Aug 14, 2023 •

edited

Loading

BeingGod commented Aug 15, 2023 •

edited

Loading