-
Notifications
You must be signed in to change notification settings - Fork 570
[V1][BUGFIX][0.10.1] FIX mtp on main branch #2632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request appears to be a bugfix for Multi-Token Prediction (MTP) on the main branch. The changes involve selecting the correct MTP model implementation (TorchairDeepSeekMTP) when torchair graph is enabled, and updating the quantization logic for FusedMoE. I've found a critical issue in the FusedMoE quantization logic where a layer that should be skipped would be incorrectly quantized due to an unconditional assignment. I've provided a suggestion to fix this logical error.
e8b9bd3 to
3f21225
Compare
|
Plz add more details in pr message to describe the specific issue this pr fixes |
d82cb35 to
88a5ede
Compare
4e7011b to
cb55477
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
MengqingCao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
9b85bbe to
3b66cdd
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2632 +/- ##
==========================================
+ Coverage 72.61% 73.57% +0.96%
==========================================
Files 147 151 +4
Lines 21805 21945 +140
==========================================
+ Hits 15833 16147 +314
+ Misses 5972 5798 -174
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: xuyexiong <xuyexiong@huawei.com>
MengqingCao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this first as the failed CI cases is not introduced in this pr, will fix in #2687
### What this PR does / why we need it? Fix MTP torchair bug caused by torchair refactor and moe refactor Depends on PRs: fused moe fix: vllm-project#2627 torchair multi DP fix: vllm-project#2626 ### Does this PR introduce _any_ user-facing change? when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp `--disable-log-stats` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@7c8271c Signed-off-by: xuyexiong <xuyexiong@huawei.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
### What this PR does / why we need it? Fix MTP torchair bug caused by torchair refactor and moe refactor Depends on PRs: fused moe fix: vllm-project#2627 torchair multi DP fix: vllm-project#2626 ### Does this PR introduce _any_ user-facing change? when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp `--disable-log-stats` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@7c8271c Signed-off-by: xuyexiong <xuyexiong@huawei.com>
### What this PR does / why we need it? Fix MTP torchair bug caused by torchair refactor and moe refactor Depends on PRs: fused moe fix: vllm-project#2627 torchair multi DP fix: vllm-project#2626 ### Does this PR introduce _any_ user-facing change? when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp `--disable-log-stats` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@7c8271c Signed-off-by: xuyexiong <xuyexiong@huawei.com>
### What this PR does / why we need it? Fix MTP torchair bug caused by torchair refactor and moe refactor Depends on PRs: fused moe fix: vllm-project#2627 torchair multi DP fix: vllm-project#2626 ### Does this PR introduce _any_ user-facing change? when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp `--disable-log-stats` ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@7c8271c Signed-off-by: xuyexiong <xuyexiong@huawei.com>
What this PR does / why we need it?
Fix MTP torchair bug caused by torchair refactor and moe refactor
Depends on PRs:
fused moe fix: #2627
torchair multi DP fix: #2626
Does this PR introduce any user-facing change?
when dp is enabled, to run mtp online server, need to disable server log due to the current metrics does not support multi dp
--disable-log-statsHow was this patch tested?