Support Deepseek w4a8 quantization #1182

pichangping · 2025-06-12T02:24:01Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: pichangping <1337510399@qq.com>

wangxiyuan · 2025-06-12T02:38:19Z

Thanks for your contribution. I have some question before deep review:

how to convert a w4a8 weight? which version of modelslim should be used?
does deepseek v2 lite works with w4a8 as well? If yes, the e2e test should be added as well. @22dimensions please help to update a weight to modelscope as well.
does w4a8 only works with deepseek, how about any other model, like Qwen?

pichangping · 2025-06-12T08:22:33Z

First, quantize using the BF16 format to int8, then quantize the obtained int8 to int4. Since there is no int4 data format, the tool outputs the quantized weights using the low-precision int8.
2.Whether it is compatible is not clear
3.Other model features are supported, but accuracy is not guaranteed.

pichangping · 2025-06-12T08:44:25Z

In the main branch of this repository "https://gitee.com/ascend/msit", you can access the file "msmodelslim/example/DeepSeek/quant_deepseek_w4a8.py". Use the command "python quant_deepseek_w4a8.py --model_path DeepSeek-R1-BF16 --save_path DeepSeek-R1-W4A8 --mindie_format". If you are unfamiliar with the parameters, you can read msmodelslim/example/DeepSeek/README.md.

pichangping · 2025-06-13T07:11:24Z

modelslim
Use branch master
Command: Refer to README.md for 运行前必检 and DeepSeek-R1 w4a8 混合量化
Reference command: python3 quant_deepseek_w4a8.py --model_path {float weight path} --save_path {W4A8 quantized weight path} --mindie_format
Since mindie_format generates a mindie format, some adaptation modifications are needed for vllm usage:
Rename quant_model_description_w8a8_dynamic.json to quant_model_description.json, and add the configuration "group_size": 256 in this file
Modify config.json: Change model_type to deepseek_v3; delete quantization_config

wangxiyuan · 2025-06-18T12:54:06Z

tests/multicard/test_model_DeepSeek_W4A8,py

@@ -0,0 +1,60 @@
+#


rebase to main please, and move this file to tests/e2e/multicard

The file name was also wrong....

Yikun

Please do rebase
Please update commits msg to show step by step test include

After this PR merged

Please do refactor to inhert w8a8
Please add doc

Yikun · 2025-06-18T13:26:19Z

vllm_ascend/quantization/w4a8_dynamic.py

+from vllm_ascend.utils import dispose_tensor
+from vllm.config import get_current_vllm_config
+
+VLLM_ENABLE_MC2: bool = envs_ascend.VLLM_ENABLE_MC2


Please remove this, this already remove in #1229

pichangping added 5 commits June 11, 2025 23:00

support Deepseek_w4a8 quantization

4c3cf76

Signed-off-by: pichangping <1337510399@qq.com>

update quantizer.py

2e8fdcb

Signed-off-by: pichangping <1337510399@qq.com>

update quantizer.py

85626fa

Signed-off-by: pichangping <1337510399@qq.com>

update w4a8_dynamic

1b016c4

Signed-off-by: pichangping <1337510399@qq.com>

update w4a8_dynamic

3183c3c

Signed-off-by: pichangping <1337510399@qq.com>

github-actions bot added the module:quantization label Jun 12, 2025

pichangping added 2 commits June 18, 2025 15:39

Merge branch 'vllm-project:main' into main

03013fc

Create test_model_DeepSeek_W4A8,py

ade4d28

github-actions bot added the module:tests label Jun 18, 2025

Update test_model_DeepSeek_W4A8,py

5e366bc

wangxiyuan reviewed Jun 18, 2025

View reviewed changes

Yikun reviewed Jun 18, 2025

View reviewed changes

pichangping added 4 commits June 19, 2025 18:01

Merge branch 'vllm-project:main' into main

949af1a

Update w4a8_dynamic.py

e607218

Update w4a8_dynamic.py

fdd4701

Update quantizer.py

2beb09b

pichangping closed this Jun 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Deepseek w4a8 quantization #1182

Support Deepseek w4a8 quantization #1182

Uh oh!

pichangping commented Jun 12, 2025

Uh oh!

wangxiyuan commented Jun 12, 2025

Uh oh!

pichangping commented Jun 12, 2025

Uh oh!

pichangping commented Jun 12, 2025

Uh oh!

pichangping commented Jun 13, 2025

Uh oh!

wangxiyuan Jun 18, 2025

Uh oh!

Yikun Jun 19, 2025

Uh oh!

Yikun left a comment

Uh oh!

Yikun Jun 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Support Deepseek w4a8 quantization #1182

Support Deepseek w4a8 quantization #1182

Uh oh!

Conversation

pichangping commented Jun 12, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangxiyuan commented Jun 12, 2025

Uh oh!

pichangping commented Jun 12, 2025

Uh oh!

pichangping commented Jun 12, 2025

Uh oh!

pichangping commented Jun 13, 2025

Uh oh!

wangxiyuan Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

Yikun Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants