-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Device Event Creation #57574
Fix Device Event Creation #57574
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
This PR should resolve the problem of NCCL not being able to overlap with compute kernels if tensor parallelism is enabled. |
It was mentioned that once the CUDA_DEVICE_MAX_CONNECTIONS flag is set, overlapping operations become entirely unfeasible. This limitation arises from the blocking introduced by the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
gongweibao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
9758c33
to
a397713
Compare
@eee4017 , PR-CI-ROCM-Compile:
|
This branch was cherry-picked by #57656. Close this PR. |
Hi @eee4017 , #57656 是将该PR的修改cherry-pick到incubate/new_frl分支,develop分支仍需求合入该PR。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 89, DeviceEvent event(place)
-> DeviceEvent event(place, paddle::platform::GenerateDeviceEventFlag()
.
a397713
to
2c28c92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great work!
* Fix Device Event Creation * Fix Device Event Test * fix test
* Fix Device Event Creation * Fix Device Event Test * fix test
* Fix Device Event Creation * Fix Device Event Test * fix test
PR types
Bug fixes
PR changes
Others
Description
In the Paddle framework,
DeviceEvent
are currently created using thecudaEventDefault
flag. However, these events are exclusively used for synchronization purposes and not for timing measurements. Therefore, it would be more appropriate to create these events with thecudaEventDisableTiming
flag. This PR is intended to correct the flag used during CUDA event creation. Additionally, the existing default flag forDeviceEvent
is ambiguously set to 0. To improve clarity and prevent misunderstandings, this default flag has been removed. Instead, the flag is now explicitly generated using theGenerateDeviceEventFlag()
function.