Skip to content

Conversation

@pichangping
Copy link
Contributor

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
@wangxiyuan
Copy link
Collaborator

Thanks for your contribution. I have some question before deep review:

  1. how to convert a w4a8 weight? which version of modelslim should be used?
  2. does deepseek v2 lite works with w4a8 as well? If yes, the e2e test should be added as well. @22dimensions please help to update a weight to modelscope as well.
  3. does w4a8 only works with deepseek, how about any other model, like Qwen?

@pichangping
Copy link
Contributor Author

  1. First, quantize using the BF16 format to int8, then quantize the obtained int8 to int4. Since there is no int4 data format, the tool outputs the quantized weights using the low-precision int8.
    2.Whether it is compatible is not clear
    3.Other model features are supported, but accuracy is not guaranteed.

@pichangping
Copy link
Contributor Author

In the main branch of this repository "https://gitee.com/ascend/msit", you can access the file "msmodelslim/example/DeepSeek/quant_deepseek_w4a8.py". Use the command "python quant_deepseek_w4a8.py --model_path DeepSeek-R1-BF16 --save_path DeepSeek-R1-W4A8 --mindie_format". If you are unfamiliar with the parameters, you can read msmodelslim/example/DeepSeek/README.md.

@pichangping
Copy link
Contributor Author

modelslim
Use branch master
Command: Refer to README.md for 运行前必检 and DeepSeek-R1 w4a8 混合量化
Reference command: python3 quant_deepseek_w4a8.py --model_path {float weight path} --save_path {W4A8 quantized weight path} --mindie_format
Since mindie_format generates a mindie format, some adaptation modifications are needed for vllm usage:
Rename quant_model_description_w8a8_dynamic.json to quant_model_description.json, and add the configuration "group_size": 256 in this file
Modify config.json: Change model_type to deepseek_v3; delete quantization_config

@@ -0,0 +1,60 @@
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase to main please, and move this file to tests/e2e/multicard

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name was also wrong....

Copy link
Collaborator

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please do rebase
  2. Please update commits msg to show step by step test include

After this PR merged

  1. Please do refactor to inhert w8a8
  2. Please add doc

from vllm_ascend.utils import dispose_tensor
from vllm.config import get_current_vllm_config

VLLM_ENABLE_MC2: bool = envs_ascend.VLLM_ENABLE_MC2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this, this already remove in #1229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants