Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214

Merged
merged 3 commits into from
Dec 27, 2023

Conversation

GhostScreaming
Copy link
Contributor

@GhostScreaming GhostScreaming commented Dec 21, 2023

PR types

Bug fixes

PR changes

Others

Description

PCard-73145

修复动半下,流水线并行的非计算节点对 uninitialized Tensor 会返回 Python None的问题。并修复hook打印uninitialized Tensor会报错的问题。

nn.Linear有一个已知问题:bias可以为None,相应的传给_C_ops.linear的C++ bias Tensor是unitialized的,相应的会跳过add bias计算。这与动半pp的unitialized Tensor语义冲突。考虑这种情况:动半使用有bias的Linear,但非计算节点的Linear.bias天然是unitialized的,它会跳过调用PHI API elementwise_add的操作,而计算节点仍旧有elementwise_add。目前这个问题没有造成影响,例如save_load如果要存储Linear.bias,仍旧可以通过paddle.distributed.reshard,从对应节点取得正确的bias。动转静也是根据python侧的nn.Linear改写的,跳过PHI API的add bias计算没有影响。

Copy link

paddle-bot bot commented Dec 21, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@LiYuRio LiYuRio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wanghuancoder wanghuancoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GhostScreaming GhostScreaming merged commit 1862518 into PaddlePaddle:develop Dec 27, 2023
29 checks passed
Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024
…rank. (PaddlePaddle#60214)

* [AutoParallel] Fix pipeline parallel get none grad in non-computation rank.

* fix optimizer update parameter is uninitialized

* fix gradient clip

---------

Co-authored-by: LiYuRio <liyuruijx@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants