Skip to content

reduce cpu host overhead when using moe#5578

Merged
tohtana merged 3 commits intodeepspeedai:masterfrom
ranzhejiang:zhejiang/reduce_host_overhead_moe
Aug 21, 2024
Merged

reduce cpu host overhead when using moe#5578
tohtana merged 3 commits intodeepspeedai:masterfrom
ranzhejiang:zhejiang/reduce_host_overhead_moe

Conversation

@ranzhejiang
Copy link
Contributor

@ranzhejiang ranzhejiang commented May 29, 2024

The operation .to('cpu') is not necessary for exp_counts, and it will cause device to host synchronization which damage performance.

@ranzhejiang ranzhejiang requested a review from awan-10 as a code owner May 29, 2024 04:01
@loadams loadams requested a review from tohtana May 31, 2024 22:15
Copy link
Collaborator

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranzhejiang Thank you for your contribution! I have a few questions about your changes. Can you clarify them?

@ranzhejiang
Copy link
Contributor Author

Hi, @tohtana I have clarified the modifications you mentioned and retest this PR with Megatron-Deepspeed on GPU platform(8xA800). It runs well and loss remains consistent with the original method, Could you please help review it again? Thanks!

@ranzhejiang
Copy link
Contributor Author

#5881 also adopts this plan to reduce cpu time

@tohtana tohtana added this pull request to the merge queue Aug 21, 2024
Merged via the queue into deepspeedai:master with commit 7260890 Aug 21, 2024
mauryaavinash95 pushed a commit to DataStates/DeepSpeed that referenced this pull request Mar 20, 2025
The operation `.to('cpu') `is not necessary for exp_counts, and it will
cause device to host synchronization which damage performance.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants