Skip to content

Conversation

@zzzzwwjj
Copy link
Collaborator

@zzzzwwjj zzzzwwjj commented Jul 24, 2025

What this PR does / why we need it?

A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable.
Some details for this PR:

Add ascend_forward_context;
Update mc2_v2 op, and support active_mask param;
Update scripts in examples dir;
refactor dummy_run logic;
Add soc_version for A2 and A3;

Does this PR introduce any user-facing change?

No change at user-facing.

How was this patch tested?

@ApsarasX
Copy link
Collaborator

PR too large, split into several small PRs?

@zzzzwwjj zzzzwwjj force-pushed the main branch 4 times, most recently from fdc2db9 to 8b4c2a2 Compare July 24, 2025 09:04
@zzzzwwjj
Copy link
Collaborator Author

PR too large, split into several small PRs?

A little difficult😂

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzzzwwjj <1183291235@qq.com>
@codecov
Copy link

codecov bot commented Jul 25, 2025

Codecov Report

❌ Patch coverage is 55.34884% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.11%. Comparing base (df0ec55) to head (fb450e2).
⚠️ Report is 620 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/ascend_forward_context.py 45.61% 31 Missing ⚠️
vllm_ascend/quantization/w8a8_dynamic.py 14.70% 29 Missing ⚠️
vllm_ascend/distributed/parallel_state.py 45.83% 13 Missing ⚠️
vllm_ascend/ops/fused_moe.py 80.76% 10 Missing ⚠️
vllm_ascend/utils.py 46.66% 8 Missing ⚠️
vllm_ascend/models/deepseek_dbo.py 0.00% 3 Missing ⚠️
tests/ut/ops/test_fused_ops.py 87.50% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1979      +/-   ##
==========================================
- Coverage   71.73%   71.11%   -0.62%     
==========================================
  Files          96       98       +2     
  Lines       10719    10857     +138     
==========================================
+ Hits         7689     7721      +32     
- Misses       3030     3136     +106     
Flag Coverage Δ
unittests 71.11% <55.34%> (-0.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ut for ascend_forward_context and parallel_state is missing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_input_tokens is uselss.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is AscendSocVersion.MAX? how it will be used?

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

from vllm.config import VllmConfig
from vllm.distributed import get_dp_group, get_ep_group, get_tp_group
from vllm.forward_context import get_forward_context, set_forward_context
from vllm.platforms import current_platform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid to use current_platform in vllm-ascend. It'll lead circle import in some case. Use NPUPlatform directly


from vllm.logger import logger

from .func_wrapper import (wrapper_load_model, wrapper_rmsnorm_forward_oot,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapper_load_model function can be removed as well

self.model_runner._dummy_run(max_num_tokens,
is_compile=False,
with_prefill=with_prefill)
self.model_runner._dummy_run(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ApsarasX @jianzs please double check this change. Thanks

@zzzzwwjj zzzzwwjj force-pushed the ascend_forward_context_refactor branch from 18c31f8 to fb450e2 Compare July 27, 2025 14:18
@Yikun
Copy link
Collaborator

Yikun commented Jul 28, 2025

Please fullfill commits msg and plus how to test

@zzzzwwjj
Copy link
Collaborator Author

zzzzwwjj commented Jul 28, 2025

Add ut test issue: #2056

@zzzzwwjj
Copy link
Collaborator Author

Please fullfill commits msg and plus how to test

done.

@wangxiyuan wangxiyuan merged commit ba3dfbd into vllm-project:main Jul 28, 2025
25 checks passed
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…m-project#1979)

A refactoring of forward_context and model_runner_v1, add some context
which is necessary in model inference into forward_context, and refactor
dummy_run logic, make it more reasonable.
Some details for this PR:

Add `ascend_forward_context`;
Update mc2_v2 op, and support `active_mask` param;
Update scripts in examples dir;
refactor `dummy_run` logic;
Add soc_version for A2 and A3;

No change at user-facing.

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@57c22e5

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…m-project#1979)

### What this PR does / why we need it?

A refactoring of forward_context and model_runner_v1, add some context
which is necessary in model inference into forward_context, and refactor
dummy_run logic, make it more reasonable.
Some details for this PR:

Add `ascend_forward_context`;
Update mc2_v2 op, and support `active_mask` param;
Update scripts in examples dir;
refactor `dummy_run` logic;
Add soc_version for A2 and A3;

### Does this PR introduce _any_ user-facing change?

No change at user-facing.

### How was this patch tested?


- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@57c22e5

Signed-off-by: zzzzwwjj <1183291235@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants