Skip to content

Conversation

@Irving11-BKN
Copy link
Contributor

@Irving11-BKN Irving11-BKN commented Jul 24, 2025

Support the inference of the Deepseekr1-w8a8-mtp model with statically-quantized shared_head in MTP layers.

Signed-off-by: curryliu [120010041@link.cuhk.edu.cn]

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
@codecov
Copy link

codecov bot commented Jul 24, 2025

Codecov Report

❌ Patch coverage is 37.50000% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.43%. Comparing base (ff97740) to head (ad4fd7c).
⚠️ Report is 638 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/quantization/quant_config.py 33.33% 6 Missing ⚠️
vllm_ascend/models/deepseek_mtp.py 42.85% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1994      +/-   ##
==========================================
- Coverage   71.49%   71.43%   -0.06%     
==========================================
  Files          86       86              
  Lines        9131     9145      +14     
==========================================
+ Hits         6528     6533       +5     
- Misses       2603     2612       +9     
Flag Coverage Δ
unittests 71.43% <37.50%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jianzs jianzs merged commit ca8007f into vllm-project:main Jul 29, 2025
24 checks passed
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jul 30, 2025
…ect#1994)

Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@6eca337

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jul 30, 2025
…ect#1994)

Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@6eca337

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
wangxiyuan pushed a commit that referenced this pull request Aug 6, 2025
### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to #1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
zzhx1 pushed a commit to lidenghui1110/vllm-ascend that referenced this pull request Aug 11, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#1994)

Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@6eca337

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#1994)

Support the inference of the Deepseekr1-w8a8-mtp model with
statically-quantized shared_head in MTP layers.

- vLLM version: v0.9.2
- vLLM main:
vllm-project/vllm@6eca337

Signed-off-by: curryliu <120010041@link.cuhk.edu.cn>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…roject#2219)

### What this PR does / why we need it?

Fixes unable to load `qwen3_moe` quantized weights issue due to vllm-project#1994

### Does this PR introduce _any_ user-facing change?

None

### How was this patch tested?

Add a `qwen3_moe` W8A8 quantized model in
`tests/e2e/multicard/test_qwen3_moe.py`

- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@c494f96

---------

Signed-off-by: zhoux77899 <zhouxiang100@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants