[refactor] Refactoring forward_context and model_runner_v1 #1422

zzzzwwjj · 2025-06-25T07:39:50Z

What this PR does / why we need it?

A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable.
Some details for this PR:

Fix acc bug when online + multi-DP + eager mode + all_gather mode;
Fix bug when online + multi-DP + eager mode + mc2 mode;
Fix bug when A2 + eager mode + mc2 mode;
enable different token_num on different chip when mc2 mode;
Update scripts in examples dir;

Does this PR introduce any user-facing change?

This PR remove expert_tensor_parallel_size in additional_config, we will use enable_expert_parallel to control whether expert_parallel is enable, which is consistent with vLLM.

How was this patch tested?

Signed-off-by: zzzzwwjj <1183291235@qq.com>

…raph_mode Signed-off-by: zzzzwwjj <1183291235@qq.com>

whx-sjtu · 2025-06-25T12:24:53Z

vllm_ascend/utils.py

Do we need to add another version here corresponding to 310P?

We can do it in the future.

MengqingCao · 2025-06-25T13:41:55Z

docs/source/user_guide/additional_config.md

Don't we maintain etp any more?

Given the absence of relevant scenarios, employing EP or full TP is sufficient, for now. We may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the number of nodes exceeds the number of experts.

However, we do have customer scenarios that require such configurations. While DeepSeek models might not need this, there are use cases involving large-scale MoE (Mixture of Experts) models that require splitting across both Tensor Parallelism (TP) and Expert Parallelism (EP), or sometimes just TP alone. This is exactly the case with the current Jieyue Xingchen models

yiz-liu · 2025-06-25T13:55:44Z

LGTM

whx-sjtu · 2025-06-25T14:13:09Z

LGTM

weijinqian0 · 2025-06-30T08:12:20Z

This solution is not fully aligned with the current ETP solution. For example, EP and ETP cannot be supported at the same time.

### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of #1422 backport. Fixes #1396 #1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>

Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>

### What this PR does / why we need it? Remove ETP/EP maintained in branch main. We drop this as there is no relevant scenarios to use ETP now, and we may subsequently advocate implementing expert tensor parallelism in vLLM to support scenarios where the expert is needed to be sliced This is a part of vllm-project#1422 backport. Fixes vllm-project#1396 vllm-project#1154 ### Does this PR introduce _any_ user-facing change? We'll not maintain etp/ep in vllm-ascend anymore, and use the tp/ep in vllm instead. ### How was this patch tested? CI passed with new added and existing test. - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@fe8a2c5 Signed-off-by: MengqingCao <cmq0113@163.com>

[refactor] Refactoring forward_context and model_runner_v1

0c7375f

Signed-off-by: zzzzwwjj <1183291235@qq.com>

github-actions bot added documentation Improvements or additions to documentation module:tests module:ops module:core module:quantization labels Jun 25, 2025

zzzzwwjj force-pushed the v0.9.1-dev branch 2 times, most recently from cfd63c5 to 48fd2a1 Compare June 25, 2025 10:54

[bugfix] Pass in mc2 param according to soc_version and is_torchair_g…

ab6fb64

…raph_mode Signed-off-by: zzzzwwjj <1183291235@qq.com>

zzzzwwjj force-pushed the v0.9.1-dev branch from 48fd2a1 to ab6fb64 Compare June 25, 2025 11:46

whx-sjtu reviewed Jun 25, 2025

View reviewed changes

MengqingCao mentioned this pull request Jun 25, 2025

[v0.9.1][DP] Tiny fix of dp and update example #1277

Closed

MengqingCao reviewed Jun 25, 2025

View reviewed changes

wangxiyuan approved these changes Jun 25, 2025

View reviewed changes

ganyi1996ppo merged commit cd65d15 into vllm-project:v0.9.1-dev Jun 25, 2025
17 checks passed

MengqingCao mentioned this pull request Jul 2, 2025

[ExternalDP][RL] Make external DP support on EP and ETP #1384

Closed

Yikun added the no-main label Jul 7, 2025

MengqingCao mentioned this pull request Jul 9, 2025

[Dist][EP] Remove ETP/EP maintained in vllm-ascend #1681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] Refactoring forward_context and model_runner_v1 #1422

[refactor] Refactoring forward_context and model_runner_v1 #1422

Uh oh!

zzzzwwjj commented Jun 25, 2025 •

edited

Loading

Uh oh!

whx-sjtu Jun 25, 2025

Uh oh!

zzzzwwjj Jun 25, 2025

Uh oh!

MengqingCao Jun 25, 2025

Uh oh!

yiz-liu Jun 25, 2025

Uh oh!

leograssroot Jul 1, 2025

Uh oh!

yiz-liu commented Jun 25, 2025

Uh oh!

whx-sjtu commented Jun 25, 2025

Uh oh!

Uh oh!

weijinqian0 commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[refactor] Refactoring forward_context and model_runner_v1 #1422

[refactor] Refactoring forward_context and model_runner_v1 #1422

Uh oh!

Conversation

zzzzwwjj commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

whx-sjtu Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

zzzzwwjj Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

leograssroot Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

yiz-liu commented Jun 25, 2025

Uh oh!

whx-sjtu commented Jun 25, 2025

Uh oh!

Uh oh!

weijinqian0 commented Jun 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

zzzzwwjj commented Jun 25, 2025 •

edited

Loading