[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #2134

Irving11-BKN · 2025-07-31T07:55:15Z

When run the inference of ds-w8a8-mtp, it reported 'ParamllelLMhead has no attribute 'params_dtype''.

add wrapper of vocab_parallel_embedding, fixed the bugs when running deepseek-w8a8-mtp

Signed-off-by: curryliu 120010041@link.cuhk.edu.cn

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@ad57f23

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

github-actions · 2025-07-31T08:49:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

MengqingCao · 2025-07-31T09:11:56Z

vllm_ascend/quantization/func_wrapper.py

+        )
+        if params_dtype is None:
+            params_dtype = torch.get_default_dtype()
+        self.params_dtype = params_dtype


plz add a todo here, we need to make a pr in vllm to finally fix this

codecov · 2025-07-31T09:39:16Z

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.63%. Comparing base (2008152) to head (9d725d5).
⚠️ Report is 640 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/quantization/func_wrapper.py	25.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2134      +/-   ##
==========================================
+ Coverage   75.05%   76.63%   +1.58%     
==========================================
  Files         103      107       +4     
  Lines       11355    11977     +622     
==========================================
+ Hits         8522     9179     +657     
+ Misses       2833     2798      -35

Flag	Coverage Δ
unittests	`76.63% <45.45%> (+1.58%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

MengqingCao · 2025-08-04T02:21:24Z

LGTM, plz update the pr message to clearly describe the bug it fixes

vllm-project#2134) When run the inference of ds-w8a8-mtp, it reported 'ParamllelLMhead has no attribute 'params_dtype''. 1. add wrapper of vocab_parallel_embedding, fixed the bugs when running deepseek-w8a8-mtp Signed-off-by: curryliu <120010041@link.cuhk.edu.cn> - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@ad57f23 --------- Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp

9e05cf3

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

Irving11-BKN mentioned this pull request Jul 31, 2025

[0.9.1][bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #1990

Merged

github-actions bot added the module:quantization label Jul 31, 2025

MengqingCao reviewed Jul 31, 2025

View reviewed changes

add todo in func_wrapper

9d725d5

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>

Irving11-BKN requested a review from MengqingCao August 4, 2025 01:51

zzzzwwjj approved these changes Aug 4, 2025

View reviewed changes

wangxiyuan approved these changes Aug 4, 2025

View reviewed changes

wangxiyuan merged commit 688350a into vllm-project:main Aug 4, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #2134

[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #2134

Uh oh!

Irving11-BKN commented Jul 31, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

MengqingCao Jul 31, 2025

Uh oh!

Irving11-BKN Aug 1, 2025

Uh oh!

codecov bot commented Jul 31, 2025 •

edited

Loading

Uh oh!

MengqingCao commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #2134

[bugfixed] fix the bug when run the inference of quantized ds-w8a8-mtp #2134

Uh oh!

Conversation

Irving11-BKN commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

MengqingCao Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Irving11-BKN Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MengqingCao commented Aug 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Irving11-BKN commented Jul 31, 2025 •

edited

Loading

codecov bot commented Jul 31, 2025 •

edited

Loading