[Performance] [flash_communication_v1] DeepSeek communication optimization on A2 (reduce_scatter + all_gather) #1034

underfituu · 2025-05-30T08:04:37Z

What this PR does / why we need it?

This PR introduces communication optimization for Ascend A2 clusters in DeepSeek training. The key improvements include:

Communication Optimization:

Replaces all-reduce and broadcast operations with reduce-scatter and all-gather when dp_size > 1
Reduces communication volume and improves performance

Computation Optimization:

Uses scattered data for addition operations
- Shared expert hidden_states + routing expert hidden_states
- Residual + hidden_states

Does this PR introduce any user-facing change?

None.

How was this patch tested?

We've tested it on benchmark, it meets our satisfaction.

ApsarasX · 2025-06-03T02:54:16Z

vllm_ascend/models/__init__.py

        "DeepseekV3ForCausalLM",
        "vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM")

+    if soc_version == 223 or soc_version == 224:


What does soc v223 or soc v224 mean?

ApsarasX · 2025-06-03T02:56:01Z

vllm_ascend/distributed/parallel_state.py

    return _ETP

+def get_wp_group() -> GroupCoordinator:
+    assert _WP is not None, (


Move the declaration of _WP to global scope?

github-actions · 2025-06-07T08:49:10Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

underfituu and others added 2 commits May 30, 2025 15:28

A2_DeeepSeek_prefill_opt

b624c4b

DeepSeek prefill optimization using flash_comm_v1 (A2)

2fe44e2

ApsarasX reviewed Jun 3, 2025

View reviewed changes

wangxiyuan mentioned this pull request Jun 4, 2025

[release] 0.9.0rc1 release checklist #904

Closed

76 tasks

github-actions bot added the merge-conflicts label Jun 7, 2025

underfituu closed this Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] [flash_communication_v1] DeepSeek communication optimization on A2 (reduce_scatter + all_gather) #1034

[Performance] [flash_communication_v1] DeepSeek communication optimization on A2 (reduce_scatter + all_gather) #1034

Uh oh!

underfituu commented May 30, 2025

Uh oh!

ApsarasX Jun 3, 2025

Uh oh!

ApsarasX Jun 3, 2025

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Performance] [flash_communication_v1] DeepSeek communication optimization on A2 (reduce_scatter + all_gather) #1034

[Performance] [flash_communication_v1] DeepSeek communication optimization on A2 (reduce_scatter + all_gather) #1034

Uh oh!

Conversation

underfituu commented May 30, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ApsarasX Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

ApsarasX Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants