[main][refactor] Refactoring forward_context and model_runner_v1 #1979

zzzzwwjj · 2025-07-24T03:25:11Z

What this PR does / why we need it?

A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable.
Some details for this PR:

Add ascend_forward_context;
Update mc2_v2 op, and support active_mask param;
Update scripts in examples dir;
refactor dummy_run logic;
Add soc_version for A2 and A3;

Does this PR introduce any user-facing change?

No change at user-facing.

How was this patch tested?

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@57c22e5

ApsarasX · 2025-07-24T06:26:06Z

PR too large, split into several small PRs?

zzzzwwjj · 2025-07-24T09:17:03Z

PR too large, split into several small PRs?

A little difficult😂

github-actions · 2025-07-24T11:38:39Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzzzwwjj <1183291235@qq.com>

codecov · 2025-07-25T03:57:15Z

Codecov Report

❌ Patch coverage is 55.34884% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.11%. Comparing base (df0ec55) to head (fb450e2).
⚠️ Report is 620 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ascend_forward_context.py	45.61%	31 Missing ⚠️
vllm_ascend/quantization/w8a8_dynamic.py	14.70%	29 Missing ⚠️
vllm_ascend/distributed/parallel_state.py	45.83%	13 Missing ⚠️
vllm_ascend/ops/fused_moe.py	80.76%	10 Missing ⚠️
vllm_ascend/utils.py	46.66%	8 Missing ⚠️
vllm_ascend/models/deepseek_dbo.py	0.00%	3 Missing ⚠️
tests/ut/ops/test_fused_ops.py	87.50%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1979      +/-   ##
==========================================
- Coverage   71.73%   71.11%   -0.62%     
==========================================
  Files          96       98       +2     
  Lines       10719    10857     +138     
==========================================
+ Hits         7689     7721      +32     
- Misses       3030     3136     +106

Flag	Coverage Δ
unittests	`71.11% <55.34%> (-0.62%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MengqingCao · 2025-07-25T06:05:44Z

examples/data_parallel.py

We already have a dp example now: https://github.com/vllm-project/vllm-ascend/blob/main/examples/offline_data_parallel.py

vllm_ascend/worker/worker_v1.py

wangxiyuan · 2025-07-25T08:49:51Z

vllm_ascend/ascend_forward_context.py

The ut for ascend_forward_context and parallel_state is missing.

wangxiyuan · 2025-07-25T08:51:25Z

vllm_ascend/attention/attention_v1_torchair.py

num_input_tokens is uselss.

wangxiyuan · 2025-07-25T08:55:07Z

vllm_ascend/utils.py

what is AscendSocVersion.MAX? how it will be used?

github-actions · 2025-07-26T00:22:33Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-07-27T00:54:26Z

vllm_ascend/ascend_forward_context.py

+from vllm.config import VllmConfig
+from vllm.distributed import get_dp_group, get_ep_group, get_tp_group
+from vllm.forward_context import get_forward_context, set_forward_context
+from vllm.platforms import current_platform


avoid to use current_platform in vllm-ascend. It'll lead circle import in some case. Use NPUPlatform directly

wangxiyuan · 2025-07-27T00:57:38Z

vllm_ascend/quantization/quantizer.py


 from vllm.logger import logger

-from .func_wrapper import (wrapper_load_model, wrapper_rmsnorm_forward_oot,


wrapper_load_model function can be removed as well

wangxiyuan · 2025-07-27T01:01:36Z

vllm_ascend/worker/worker_v1.py

-        self.model_runner._dummy_run(max_num_tokens,
-                                     is_compile=False,
-                                     with_prefill=with_prefill)
+        self.model_runner._dummy_run(1)


@ApsarasX @jianzs please double check this change. Thanks

Yikun · 2025-07-28T01:41:00Z

Please fullfill commits msg and plus how to test

zzzzwwjj · 2025-07-28T03:40:13Z

Add ut test issue: #2056

zzzzwwjj · 2025-07-28T04:23:19Z

Please fullfill commits msg and plus how to test

done.

…m-project#1979) A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; No change at user-facing. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@57c22e5 Signed-off-by: zzzzwwjj <1183291235@qq.com>

…m-project#1979) ### What this PR does / why we need it? A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; ### Does this PR introduce _any_ user-facing change? No change at user-facing. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@57c22e5 Signed-off-by: zzzzwwjj <1183291235@qq.com>

github-actions bot added module:ops module:core module:quantization labels Jul 24, 2025

zzzzwwjj force-pushed the main branch from 9a0f883 to 89abf95 Compare July 24, 2025 06:23

zzzzwwjj force-pushed the main branch 4 times, most recently from fdc2db9 to 8b4c2a2 Compare July 24, 2025 09:04

github-actions bot added the module:tests label Jul 24, 2025

zzzzwwjj force-pushed the main branch from 8b4c2a2 to 039de30 Compare July 24, 2025 09:18

github-actions bot added the merge-conflicts label Jul 24, 2025

[main][refactor] Refactoring forward_context and model_runner_v1

e82e946

Signed-off-by: zzzzwwjj <1183291235@qq.com>

zzzzwwjj force-pushed the main branch from 039de30 to e82e946 Compare July 24, 2025 14:51

github-actions bot removed the merge-conflicts label Jul 25, 2025

zzzzwwjj force-pushed the main branch 3 times, most recently from c62fd91 to 3b7424e Compare July 25, 2025 03:37

MengqingCao reviewed Jul 25, 2025

View reviewed changes

zzzzwwjj force-pushed the main branch from 3b7424e to ca42e65 Compare July 25, 2025 06:56

wangxiyuan reviewed Jul 25, 2025

View reviewed changes

zzzzwwjj force-pushed the main branch from ca42e65 to 28519e9 Compare July 25, 2025 13:20

wangxiyuan mentioned this pull request Jul 25, 2025

[2/N][Refactor] Refactor V1 attention for better extensibility #1995

Merged

github-actions bot added the merge-conflicts label Jul 26, 2025

Merge branch 'main' into main

a62eb0e

zzzzwwjj force-pushed the main branch from 28519e9 to a62eb0e Compare July 26, 2025 08:03

wangxiyuan reviewed Jul 27, 2025

View reviewed changes

github-actions bot removed the merge-conflicts label Jul 27, 2025

Merge branch 'main' into ascend_forward_context_refactor

fb450e2

zzzzwwjj force-pushed the ascend_forward_context_refactor branch from 18c31f8 to fb450e2 Compare July 27, 2025 14:18

ganyi1996ppo approved these changes Jul 28, 2025

View reviewed changes

wangxiyuan merged commit ba3dfbd into vllm-project:main Jul 28, 2025
25 checks passed


		from vllm.logger import logger

		from .func_wrapper import (wrapper_load_model, wrapper_rmsnorm_forward_oot,

[main][refactor] Refactoring forward_context and model_runner_v1 #1979

[main][refactor] Refactoring forward_context and model_runner_v1 #1979

Uh oh!

Conversation

zzzzwwjj commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ApsarasX commented Jul 24, 2025

Uh oh!

zzzzwwjj commented Jul 24, 2025

Uh oh!

github-actions bot commented Jul 24, 2025

Uh oh!

codecov bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MengqingCao Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangxiyuan Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

wangxiyuan Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Jul 27, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun commented Jul 28, 2025

Uh oh!

zzzzwwjj commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zzzzwwjj commented Jul 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zzzzwwjj commented Jul 24, 2025 •

edited

Loading

codecov bot commented Jul 25, 2025 •

edited

Loading

zzzzwwjj commented Jul 28, 2025 •

edited

Loading