Skip to content

Conversation

@xingmingyyj
Copy link
Contributor

@xingmingyyj xingmingyyj commented Aug 13, 2025

PR Category

User Experience

PR Types

New features

Description

添加FlexCheckpoint主要模块,包括

  • ShardedTensor,为Tensor增加了切分信息描述。
  • AOAEngine,解析AOA标记,提供目标Tensor切片和源Tensor切片的映射关系。
  • 对原有的dist checkpoint接口做升级,可以直接对ShardedTensor做save,load。
    pcard-73263

@paddle-bot
Copy link

paddle-bot bot commented Aug 13, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@xingmingyyj xingmingyyj force-pushed the add_sharded_state_dict branch from 7929113 to 1c91777 Compare August 13, 2025 08:38
@xingmingyyj xingmingyyj force-pushed the add_sharded_state_dict branch from 1c91777 to 391481c Compare August 13, 2025 08:39
@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 46.74503% with 589 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@00f6730). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...on/paddle/distributed/flex_checkpoint/aoa/lexer.py 44.20% 130 Missing ⚠️
.../paddle/distributed/flex_checkpoint/dcp/reshard.py 9.72% 130 Missing ⚠️
...rs/dygraph_optimizer/dygraph_sharding_optimizer.py 7.40% 125 Missing ⚠️
...ddle/distributed/flex_checkpoint/aoa/aoa_engine.py 80.08% 48 Missing ⚠️
...distributed/flex_checkpoint/dcp/save_state_dict.py 33.33% 42 Missing ⚠️
.../distributed/flex_checkpoint/dcp/sharded_weight.py 39.65% 35 Missing ⚠️
...on/paddle/distributed/flex_checkpoint/dcp/utils.py 36.36% 28 Missing ⚠️
...distributed/flex_checkpoint/dcp/load_state_dict.py 68.49% 23 Missing ⚠️
...n/paddle/distributed/flex_checkpoint/aoa/parser.py 89.28% 9 Missing ⚠️
python/paddle/nn/layer/layers.py 18.18% 9 Missing ⚠️
... and 2 more

❌ Your patch status has failed because the patch coverage (46.74%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #74593   +/-   ##
==========================================
  Coverage           ?   46.74%           
==========================================
  Files              ?       13           
  Lines              ?     1106           
  Branches           ?        0           
==========================================
  Hits               ?      517           
  Misses             ?      589           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xingmingyyj
Copy link
Contributor Author

/re-run Distribute-stable

From00
From00 previously approved these changes Aug 17, 2025
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,一些建议可以在下个PR修改。

From00
From00 previously approved these changes Aug 18, 2025
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@risemeup1 risemeup1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for setup.py

risemeup1
risemeup1 previously approved these changes Aug 18, 2025
sunzhongkai588
sunzhongkai588 previously approved these changes Aug 18, 2025
Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,
我看 ShardedTensor API 都暴露了,中文文档也补齐一下

zyfncg
zyfncg previously approved these changes Aug 18, 2025
Copy link
Contributor

@zyfncg zyfncg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for setup.py.in

From00
From00 previously approved these changes Aug 19, 2025
zyfncg
zyfncg previously approved these changes Aug 19, 2025
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,文档问题后续补充

"PrepareContextParallel",
"create_nccl_config",
"ShardedWeight",
"ShardedStateDict",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShardedStateDict 似乎没有文档?

@From00 From00 merged commit ed4e69d into PaddlePaddle:develop Aug 19, 2025
143 of 156 checks passed
Luckycheng222 pushed a commit to Luckycheng222/Paddle that referenced this pull request Aug 25, 2025
…ddlePaddle#74593)

* add flex checkpoint

* add aoa_engine test

* replace left arrow with right arrow

* fix api type check

* fix __init__

* rename sharded_tensor to sharded_weight

* fix path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants