Skip to content

Conversation

@zzzzwwjj
Copy link
Collaborator

@zzzzwwjj zzzzwwjj commented Jun 25, 2025

What this PR does / why we need it?

A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable.
Some details for this PR:

  1. Fix acc bug when online + multi-DP + eager mode + all_gather mode;
  2. Fix bug when online + multi-DP + eager mode + mc2 mode;
  3. Fix bug when A2 + eager mode + mc2 mode;
  4. enable different token_num on different chip when mc2 mode;
  5. Update scripts in examples dir;

Does this PR introduce any user-facing change?

This PR remove expert_tensor_parallel_size in additional_config, we will use enable_expert_parallel to control whether expert_parallel is enable, which is consistent with vLLM.

How was this patch tested?

Signed-off-by: zzzzwwjj <1183291235@qq.com>
@zzzzwwjj zzzzwwjj force-pushed the v0.9.1-dev branch 2 times, most recently from cfd63c5 to 48fd2a1 Compare June 25, 2025 10:54
…raph_mode

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add another version here corresponding to 310P?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do it in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we maintain etp any more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the absence of relevant scenarios, employing EP or full TP is sufficient, for now. We may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the number of nodes exceeds the number of experts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, we do have customer scenarios that require such configurations. While DeepSeek models might not need this, there are use cases involving large-scale MoE (Mixture of Experts) models that require splitting across both Tensor Parallelism (TP) and Expert Parallelism (EP), or sometimes just TP alone. This is exactly the case with the current Jieyue Xingchen models

@yiz-liu
Copy link
Collaborator

yiz-liu commented Jun 25, 2025

LGTM

1 similar comment
@whx-sjtu
Copy link
Collaborator

LGTM

@ganyi1996ppo ganyi1996ppo merged commit cd65d15 into vllm-project:v0.9.1-dev Jun 25, 2025
17 checks passed
@weijinqian0
Copy link
Collaborator

This solution is not fully aligned with the current ETP solution. For example, EP and ETP cannot be supported at the same time.

@Yikun Yikun added the no-main label Jul 7, 2025
wangxiyuan pushed a commit that referenced this pull request Jul 21, 2025
### What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of #1422 backport.

Fixes #1396
#1154

### Does this PR introduce _any_ user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

### How was this patch tested?
CI passed with new added and existing test.


- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@fe8a2c5

Signed-off-by: MengqingCao <cmq0113@163.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of vllm-project#1422 backport.

Fixes vllm-project#1396
vllm-project#1154

We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

CI passed with new added and existing test.

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@fe8a2c5

Signed-off-by: MengqingCao <cmq0113@163.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
Remove ETP/EP maintained in branch main. We drop this as there is no
relevant scenarios to use ETP now, and we may subsequently advocate
implementing expert tensor parallelism in vLLM to support scenarios
where the expert is needed to be sliced

This is a part of vllm-project#1422 backport.

Fixes vllm-project#1396
vllm-project#1154

### Does this PR introduce _any_ user-facing change?
We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in
vllm instead.

### How was this patch tested?
CI passed with new added and existing test.


- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@fe8a2c5

Signed-off-by: MengqingCao <cmq0113@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants