Optimize the performance of quick_allreduce by yanboshao · Pull Request #1816 · ROCm/aiter

yanboshao · 2026-01-12T07:50:00Z

Motivation

In the kernel, the calculation of offsets involves prolonged local HBM access.

Technical Details

The communication temporary buffer address of the peer GPU is stored in HBM. This address is constant and frequently accessed by each thread, so it is loaded into VGPRs at the beginning of the kernel.

Test Plan

Test Result

Dtype: bfloat16
Cudagraph: on
Device: Mi325 * 8

shape	before optimization(us)	after optimization(us)	ratio
(632,5120)	39.15	32.82	16.16%
(680,5120)	40.55	34.32	15.36%

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

TennyWang1223

LGTM

Optimize the performance of quick_allreduce

39520cc

yanboshao requested a review from a team January 12, 2026 07:50

valarLip requested a review from TennyWang1223 January 12, 2026 08:52

TennyWang1223 reviewed Jan 12, 2026

View reviewed changes

valarLip approved these changes Jan 12, 2026

View reviewed changes

valarLip merged commit ae774a3 into main Jan 12, 2026
17 checks passed

valarLip deleted the yanbo/quick_allreduce branch January 12, 2026 12:37

zhuyuhua-v pushed a commit that referenced this pull request Jan 14, 2026

Optimize the performance of quick_allreduce (#1816)

4698bbe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the performance of quick_allreduce#1816

Optimize the performance of quick_allreduce#1816
valarLip merged 1 commit intomainfrom
yanbo/quick_allreduce

yanboshao commented Jan 12, 2026

Uh oh!

TennyWang1223 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yanboshao commented Jan 12, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

TennyWang1223 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants