[perf] Improve Prefill Performance by Optimizing Alltoall Communication #978

SlightwindSec · 2025-05-27T15:31:09Z

What this PR does / why we need it?

This PR improves Prefill performance by making two key optimizations:

Optimizing alltoall communication: The previous implementation involved one all_to_all_single call followed by three all_to_all calls. This has been refactored to use three all_to_all_single calls instead, with a fixed communication buffer to eliminate an extra communication step. This change not only simplifies the communication pattern but also leverages the better performance of all_to_all_single.

While there might be minor precision trade-offs, the choice of the coefficient 2 is an empirically sound value that maintains accuracy even when expert ID distribution is imbalanced.

In testing with DeepSeek-V3, the model was able to handle 3584-token inputs with significantly improved Prefill throughput and no regression in dialog quality.

Does this PR introduce any user-facing change?

No, this PR does not introduce any user-facing changes.

How was this patch tested?

Verified correct generation behavior with DeepSeek-V3 model.
Prefill performance was benchmarked with 3584-token inputs, showing noticeable speed improvements.
Ensured that output quality remains consistent under typical workloads.

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

github-actions · 2025-06-05T08:49:31Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

[perf] Improve Prefill Performance by Optimizing Alltoall Communication

ba2e08f

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

github-actions bot added module:ops module:core module:quantization labels May 27, 2025

fix

7998d8f

Signed-off-by: SlightwindSec <slightwindsec@gmail.com>

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

github-actions bot added the merge-conflicts label Jun 5, 2025

SlightwindSec closed this Jul 3, 2025

SlightwindSec deleted the all2all branch October 13, 2025 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf] Improve Prefill Performance by Optimizing Alltoall Communication #978

[perf] Improve Prefill Performance by Optimizing Alltoall Communication #978

Uh oh!

SlightwindSec commented May 27, 2025

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[perf] Improve Prefill Performance by Optimizing Alltoall Communication #978

[perf] Improve Prefill Performance by Optimizing Alltoall Communication #978

Uh oh!

Conversation

SlightwindSec commented May 27, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant