[AutoParallel] Fix pipeline parallel get none grad in non-computatio rank. #60214
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Others
Description
PCard-73145
修复动半下,流水线并行的非计算节点对 uninitialized Tensor 会返回 Python
None
的问题。并修复hook打印uninitialized Tensor会报错的问题。nn.Linear
有一个已知问题:bias
可以为None,相应的传给_C_ops.linear
的C++ bias Tensor是unitialized的,相应的会跳过add bias计算。这与动半pp的unitialized Tensor语义冲突。考虑这种情况:动半使用有bias的Linear,但非计算节点的Linear.bias天然是unitialized的,它会跳过调用PHI APIelementwise_add
的操作,而计算节点仍旧有elementwise_add
。目前这个问题没有造成影响,例如save_load如果要存储Linear.bias,仍旧可以通过paddle.distributed.reshard
,从对应节点取得正确的bias。动转静也是根据python侧的nn.Linear
改写的,跳过PHI API的add bias计算没有影响。