[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

GhostScreaming · 2023-12-21T06:22:44Z

PR types

Bug fixes

PR changes

Others

Description

PCard-73145

修复动半下，流水线并行的非计算节点对 uninitialized Tensor 会返回 Python None的问题。并修复hook打印uninitialized Tensor会报错的问题。

nn.Linear有一个已知问题：bias可以为None，相应的传给_C_ops.linear的C++ bias Tensor是unitialized的，相应的会跳过add bias计算。这与动半pp的unitialized Tensor语义冲突。考虑这种情况：动半使用有bias的Linear，但非计算节点的Linear.bias天然是unitialized的，它会跳过调用PHI API elementwise_add的操作，而计算节点仍旧有elementwise_add。目前这个问题没有造成影响，例如save_load如果要存储Linear.bias，仍旧可以通过paddle.distributed.reshard，从对应节点取得正确的bias。动转静也是根据python侧的nn.Linear改写的，跳过PHI API的add bias计算没有影响。

paddle-bot · 2023-12-21T06:22:55Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… rank.

LiYuRio

LGTM

wanghuancoder

LGTM

…rank. (PaddlePaddle#60214) * [AutoParallel] Fix pipeline parallel get none grad in non-computation rank. * fix optimizer update parameter is uninitialized * fix gradient clip --------- Co-authored-by: LiYuRio <liyuruijx@163.com>

GhostScreaming and others added 2 commits December 25, 2023 11:29

[AutoParallel] Fix pipeline parallel get none grad in non-computation…

0ba6b0f

… rank.

fix optimizer update parameter is uninitialized

d98f1c2

LiYuRio force-pushed the fix_pp_none_grad branch from 3a829f4 to d98f1c2 Compare December 25, 2023 07:53

fix gradient clip

00a2aa7

LiYuRio force-pushed the fix_pp_none_grad branch from 0c6c724 to 00a2aa7 Compare December 26, 2023 12:53

zyfncg approved these changes Dec 27, 2023

View reviewed changes

LiYuRio approved these changes Dec 27, 2023

View reviewed changes

wanghuancoder approved these changes Dec 27, 2023

View reviewed changes

GhostScreaming merged commit 1862518 into PaddlePaddle:develop Dec 27, 2023
29 checks passed

SigureMo mentioned this pull request Jan 9, 2024

[Dy2St] fix test_grad in PIR mode #60621

Merged

GhostScreaming mentioned this pull request Jan 12, 2024

[AutoParallel] Fix bug of return PyNone. #60773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

GhostScreaming commented Dec 21, 2023 •

edited

Loading

paddle-bot bot commented Dec 21, 2023

LiYuRio left a comment

wanghuancoder left a comment

[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

Conversation

GhostScreaming commented Dec 21, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 21, 2023

LiYuRio left a comment

Choose a reason for hiding this comment

wanghuancoder left a comment

Choose a reason for hiding this comment

GhostScreaming commented Dec 21, 2023 •

edited

Loading