-
Notifications
You must be signed in to change notification settings - Fork 538
[main][refactor] Refactoring forward_context and model_runner_v1 #1979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[main][refactor] Refactoring forward_context and model_runner_v1 #1979
Conversation
|
PR too large, split into several small PRs? |
fdc2db9 to
8b4c2a2
Compare
A little difficult😂 |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: zzzzwwjj <1183291235@qq.com>
c62fd91 to
3b7424e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1979 +/- ##
==========================================
- Coverage 71.73% 71.11% -0.62%
==========================================
Files 96 98 +2
Lines 10719 10857 +138
==========================================
+ Hits 7689 7721 +32
- Misses 3030 3136 +106
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
examples/data_parallel.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a dp example now: https://github.com/vllm-project/vllm-ascend/blob/main/examples/offline_data_parallel.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ut for ascend_forward_context and parallel_state is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_input_tokens is uselss.
vllm_ascend/utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is AscendSocVersion.MAX? how it will be used?
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| from vllm.config import VllmConfig | ||
| from vllm.distributed import get_dp_group, get_ep_group, get_tp_group | ||
| from vllm.forward_context import get_forward_context, set_forward_context | ||
| from vllm.platforms import current_platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid to use current_platform in vllm-ascend. It'll lead circle import in some case. Use NPUPlatform directly
|
|
||
| from vllm.logger import logger | ||
|
|
||
| from .func_wrapper import (wrapper_load_model, wrapper_rmsnorm_forward_oot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapper_load_model function can be removed as well
| self.model_runner._dummy_run(max_num_tokens, | ||
| is_compile=False, | ||
| with_prefill=with_prefill) | ||
| self.model_runner._dummy_run(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
18c31f8 to
fb450e2
Compare
|
Please fullfill commits msg and plus how to test |
|
Add ut test issue: #2056 |
done. |
…m-project#1979) A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; No change at user-facing. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@57c22e5 Signed-off-by: zzzzwwjj <1183291235@qq.com>
…m-project#1979) ### What this PR does / why we need it? A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable. Some details for this PR: Add `ascend_forward_context`; Update mc2_v2 op, and support `active_mask` param; Update scripts in examples dir; refactor `dummy_run` logic; Add soc_version for A2 and A3; ### Does this PR introduce _any_ user-facing change? No change at user-facing. ### How was this patch tested? - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@57c22e5 Signed-off-by: zzzzwwjj <1183291235@qq.com>
What this PR does / why we need it?
A refactoring of forward_context and model_runner_v1, add some context which is necessary in model inference into forward_context, and refactor dummy_run logic, make it more reasonable.
Some details for this PR:
Add
ascend_forward_context;Update mc2_v2 op, and support
active_maskparam;Update scripts in examples dir;
refactor
dummy_runlogic;Add soc_version for A2 and A3;
Does this PR introduce any user-facing change?
No change at user-facing.
How was this patch tested?