Skip to content

Conversation

@tianhaodongbd
Copy link
Contributor

PR Category

Distributed Strategy

PR Types

Bug fixes

Description

Pcard-90602
cherry-pick pr: #73625
在开pp的场景下recreate nccl comm存在hang的问题, 原因是同一个通信组内 tcp通信时unique_key的获取是无序的,导致相互等待。当前通过有序map代替无序map来修复这个问题。

@paddle-bot
Copy link

paddle-bot bot commented Jul 22, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Please upload report for BASE (develop@d5b8b45). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...addle/phi/core/distributed/comm_context_manager.cc 0.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #74168   +/-   ##
==========================================
  Coverage           ?    0.00%           
==========================================
  Files              ?        1           
  Lines              ?        2           
  Branches           ?        0           
==========================================
  Hits               ?        0           
  Misses             ?        2           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@swgu98
Copy link
Contributor

swgu98 commented Jul 22, 2025

/re-run cpu

@tianhaodongbd
Copy link
Contributor Author

/re-run all-failed

@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Aug 2, 2025

Sorry to inform you that 727bd39's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@tianhaodongbd
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tianhaodongbd
Copy link
Contributor Author

/re-run Approval

@tianhaodongbd
Copy link
Contributor Author

/re-run all-failed

Copy link
Contributor

@SylarTiaNII SylarTiaNII left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 merged commit cc405f6 into PaddlePaddle:develop Aug 30, 2025
79 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants