[Bugfix] fix accuracy problem for quantized deepseek models #768

linfeng-yuan · 2025-05-06T11:07:18Z

What this PR does / why we need it?

The root cause of the bug is that numerical computations involving NaNs cannot eliminate them. We addressed it by using masked_fill_ to eliminate NaNs while avoiding memory-wasting torch.where approach.

Does this PR introduce any user-facing change?

Not invovled.

How was this patch tested?

This patch was tested with vllm v0.8.5 and vllm-ascend master. I run deepseek_v3 model with offline inference scripts (examples/dp_offline/run_dp.sh & data_parallel.py).

vllm_ascend/quantization/w8a8_dynamic.py

Signed-off-by: linfeng-yuan <1102311262@qq.com>

@ApsarasX

I would like to nominate Wengang Chen (@ApsarasX https://github.com/ApsarasX) as a maintainer, starting with my +1. ## Reason Review Quality‌: He focuses on the vLLM Ascend Core module review with 100+ high quality review, such as [#2326 (comment)](#2326 (comment)), [#768 (comment)](#768 (comment)), [#2312 (comment)](#2312 (comment)), [#2268 (comment)](#2268 (comment)), [#2192 (comment)](#2192 (comment)), [#2156 (comment)](#2156 (comment)). This helped vLLM Ascend v0.9.x and v0.10.x to be released with high quality. Sustained and Quality Contributions: He has a very good habit of sharing his design ideas, development process, performance test results, such as [#966](#966), he contributed [many PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+), valuable bugfixes and also perf improvements. Community Involvement: Active involved in community discussion, he is collaborative and helps the users solve problems, involved in [120+ PR and issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX). He is also the speaker of [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q). So I think he's a great addition to the vLLM Ascend Maintainer team. - ✅Review Quality‌: 108+ PR with valuable review https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX with many valuable review, like #2326 (comment) #768 (comment) #2312 (comment) #2268 (comment) #2192 (comment) #2156 (comment) - ✅ Sustained and Major Contributions https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX - ✅ Quality Contribution‌: https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed Good quality with well documents [Perf] Refactor tensor disposal logic to reduce memory usage #966 - ✅Community Involvement‌: 7 issue: https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX - 120+ PR and issue: https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

@ApsarasX

I would like to nominate Wengang Chen (@ApsarasX https://github.com/ApsarasX) as a maintainer, starting with my +1. ## Reason Review Quality‌: He focuses on the vLLM Ascend Core module review with 100+ high quality review, such as [vllm-project#2326 (comment)](vllm-project#2326 (comment)), [vllm-project#768 (comment)](vllm-project#768 (comment)), [vllm-project#2312 (comment)](vllm-project#2312 (comment)), [vllm-project#2268 (comment)](vllm-project#2268 (comment)), [vllm-project#2192 (comment)](vllm-project#2192 (comment)), [vllm-project#2156 (comment)](vllm-project#2156 (comment)). This helped vLLM Ascend v0.9.x and v0.10.x to be released with high quality. Sustained and Quality Contributions: He has a very good habit of sharing his design ideas, development process, performance test results, such as [vllm-project#966](vllm-project#966), he contributed [many PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+), valuable bugfixes and also perf improvements. Community Involvement: Active involved in community discussion, he is collaborative and helps the users solve problems, involved in [120+ PR and issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX). He is also the speaker of [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q). So I think he's a great addition to the vLLM Ascend Maintainer team. - ✅Review Quality‌: 108+ PR with valuable review https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX with many valuable review, like vllm-project#2326 (comment) vllm-project#768 (comment) vllm-project#2312 (comment) vllm-project#2268 (comment) vllm-project#2192 (comment) vllm-project#2156 (comment) - ✅ Sustained and Major Contributions https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX - ✅ Quality Contribution‌: https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed Good quality with well documents [Perf] Refactor tensor disposal logic to reduce memory usage vllm-project#966 - ✅Community Involvement‌: 7 issue: https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX - 120+ PR and issue: https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

@ApsarasX

I would like to nominate Wengang Chen (@ApsarasX https://github.com/ApsarasX) as a maintainer, starting with my +1. ## Reason Review Quality‌: He focuses on the vLLM Ascend Core module review with 100+ high quality review, such as [vllm-project#2326 (comment)](vllm-project#2326 (comment)), [vllm-project#768 (comment)](vllm-project#768 (comment)), [vllm-project#2312 (comment)](vllm-project#2312 (comment)), [vllm-project#2268 (comment)](vllm-project#2268 (comment)), [vllm-project#2192 (comment)](vllm-project#2192 (comment)), [vllm-project#2156 (comment)](vllm-project#2156 (comment)). This helped vLLM Ascend v0.9.x and v0.10.x to be released with high quality. Sustained and Quality Contributions: He has a very good habit of sharing his design ideas, development process, performance test results, such as [vllm-project#966](vllm-project#966), he contributed [many PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+), valuable bugfixes and also perf improvements. Community Involvement: Active involved in community discussion, he is collaborative and helps the users solve problems, involved in [120+ PR and issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX). He is also the speaker of [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q). So I think he's a great addition to the vLLM Ascend Maintainer team. - ✅Review Quality‌: 108+ PR with valuable review https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX with many valuable review, like vllm-project#2326 (comment) vllm-project#768 (comment) vllm-project#2312 (comment) vllm-project#2268 (comment) vllm-project#2192 (comment) vllm-project#2156 (comment) - ✅ Sustained and Major Contributions https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX - ✅ Quality Contribution‌: https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed Good quality with well documents [Perf] Refactor tensor disposal logic to reduce memory usage vllm-project#966 - ✅Community Involvement‌: 7 issue: https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX - 120+ PR and issue: https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

…ject#768) ### What this PR does / why we need it? The root cause of the bug is that numerical computations involving NaNs cannot eliminate them. We addressed it by using `masked_fill_` to eliminate NaNs while avoiding memory-wasting `torch.where` approach. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This patch was tested with vllm v0.8.5 and vllm-ascend master. I run deepseek_v3 model with offline inference scripts (examples/dp_offline/run_dp.sh & data_parallel.py). Signed-off-by: linfeng-yuan <1102311262@qq.com>

@ApsarasX

I would like to nominate Wengang Chen (@ApsarasX https://github.com/ApsarasX) as a maintainer, starting with my +1. ## Reason Review Quality‌: He focuses on the vLLM Ascend Core module review with 100+ high quality review, such as [vllm-project#2326 (comment)](vllm-project#2326 (comment)), [vllm-project#768 (comment)](vllm-project#768 (comment)), [vllm-project#2312 (comment)](vllm-project#2312 (comment)), [vllm-project#2268 (comment)](vllm-project#2268 (comment)), [vllm-project#2192 (comment)](vllm-project#2192 (comment)), [vllm-project#2156 (comment)](vllm-project#2156 (comment)). This helped vLLM Ascend v0.9.x and v0.10.x to be released with high quality. Sustained and Quality Contributions: He has a very good habit of sharing his design ideas, development process, performance test results, such as [vllm-project#966](vllm-project#966), he contributed [many PRs](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Amerged+), valuable bugfixes and also perf improvements. Community Involvement: Active involved in community discussion, he is collaborative and helps the users solve problems, involved in [120+ PR and issues](https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX). He is also the speaker of [vLLM Beijing Meetup](https://mp.weixin.qq.com/s/7n8OYNrCC_I9SJaybHA_-Q). So I think he's a great addition to the vLLM Ascend Maintainer team. - ✅Review Quality‌: 108+ PR with valuable review https://github.com/vllm-project/vllm-ascend/pulls?q=commenter%3AApsarasX with many valuable review, like vllm-project#2326 (comment) vllm-project#768 (comment) vllm-project#2312 (comment) vllm-project#2268 (comment) vllm-project#2192 (comment) vllm-project#2156 (comment) - ✅ Sustained and Major Contributions https://github.com/vllm-project/vllm-ascend/pulls/ApsarasX - ✅ Quality Contribution‌: https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3AApsarasX+is%3Aclosed Good quality with well documents [Perf] Refactor tensor disposal logic to reduce memory usage vllm-project#966 - ✅Community Involvement‌: 7 issue: https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aclosed%20author%3AApsarasX - 120+ PR and issue: https://github.com/vllm-project/vllm-ascend/issues?q=commenter%3AApsarasX Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

github-actions bot added the module:quantization label May 6, 2025

linfeng-yuan force-pushed the fix_acc branch from 6f01ed7 to e417997 Compare May 6, 2025 11:09

ApsarasX reviewed May 6, 2025

View reviewed changes

vllm_ascend/quantization/w8a8_dynamic.py Outdated Show resolved Hide resolved

linfeng-yuan force-pushed the fix_acc branch 5 times, most recently from 0ea7d83 to 780f5d1 Compare May 6, 2025 12:46

fix: fix accuracy problem for quantized deepseek models

1a87406

Signed-off-by: linfeng-yuan <1102311262@qq.com>

linfeng-yuan force-pushed the fix_acc branch from 780f5d1 to 1a87406 Compare May 6, 2025 12:47

wangxiyuan approved these changes May 6, 2025

View reviewed changes

Yikun merged commit 2cd036e into vllm-project:main May 6, 2025
14 checks passed

wangxiyuan mentioned this pull request Aug 18, 2025

Nominate ApsarasX as vllm-ascend maintainer #2419

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] fix accuracy problem for quantized deepseek models #768

[Bugfix] fix accuracy problem for quantized deepseek models #768

Uh oh!

linfeng-yuan commented May 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix] fix accuracy problem for quantized deepseek models #768

[Bugfix] fix accuracy problem for quantized deepseek models #768

Uh oh!

Conversation

linfeng-yuan commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

linfeng-yuan commented May 6, 2025 •

edited

Loading