Skip to content

Conversation

@zejunchen-zejun
Copy link
Contributor

@zejunchen-zejun zejunchen-zejun commented Sep 25, 2025

According to the different allreduce backends for ROCm devices, here a dispatcher mechanism has been developed for dispatching the performant allreduce implementations based on the different payload size, TP size and platforms. It has 3 main advantages:

  • enhance the allreduce performance for workloads
  • easy to maintain the different allreduce implementations and dispatch the performant one
  • easily adaptable to new hardware(MI400 in future)

TODO:

  • provide the performance data and accuracy with this PR on popular workloads

@zejunchen-zejun zejunchen-zejun marked this pull request as draft September 25, 2025 01:17
@mergify
Copy link

mergify bot commented Sep 25, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zejunchen-zejun.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added rocm Related to AMD ROCm needs-rebase labels Sep 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dispatch mechanism for selecting performant all-reduce implementations on ROCm platforms, which is a valuable performance enhancement. The overall structure is sound, but I've identified two critical issues that could lead to runtime errors due to improper handling of None values. Additionally, I've suggested adding missing type hints to improve code clarity and maintainability. Addressing these points will make the implementation more robust.

@zejunchen-zejun zejunchen-zejun force-pushed the zejun/add_dispatch_for_allreduce branch from 47d75a9 to 1bbb58c Compare September 25, 2025 01:30
@mergify mergify bot removed the needs-rebase label Sep 25, 2025
@zejunchen-zejun zejunchen-zejun force-pushed the zejun/add_dispatch_for_allreduce branch 3 times, most recently from 71469a3 to 145aebe Compare September 25, 2025 03:28
performant allreduce implementations for AMD platforms

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rocm Related to AMD ROCm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant