Skip to content

Conversation

@underfituu
Copy link
Contributor

What this PR does / why we need it?

This PR introduces communication optimization for Ascend A2 clusters in DeepSeek training. The key improvements include:

  1. Communication Optimization:
  • Replaces all-reduce and broadcast operations with reduce-scatter and all-gather when dp_size > 1

  • Reduces communication volume and improves performance

  1. Computation Optimization:
  • Uses scattered data for addition operations

    • Shared expert hidden_states + routing expert hidden_states

    • Residual + hidden_states

Does this PR introduce any user-facing change?

None.

How was this patch tested?

We've tested it on benchmark, it meets our satisfaction.

"DeepseekV3ForCausalLM",
"vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM")

if soc_version == 223 or soc_version == 224:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does soc v223 or soc v224 mean?

return _ETP

def get_wp_group() -> GroupCoordinator:
assert _WP is not None, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the declaration of _WP to global scope?

@github-actions
Copy link

github-actions bot commented Jun 7, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@underfituu underfituu closed this Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants