-
Notifications
You must be signed in to change notification settings - Fork 544
Support Deepseek w4a8 quantization #1182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
Signed-off-by: pichangping <1337510399@qq.com>
|
Thanks for your contribution. I have some question before deep review:
|
|
|
In the main branch of this repository "https://gitee.com/ascend/msit", you can access the file "msmodelslim/example/DeepSeek/quant_deepseek_w4a8.py". Use the command "python quant_deepseek_w4a8.py --model_path DeepSeek-R1-BF16 --save_path DeepSeek-R1-W4A8 --mindie_format". If you are unfamiliar with the parameters, you can read msmodelslim/example/DeepSeek/README.md. |
|
modelslim |
| @@ -0,0 +1,60 @@ | |||
| # | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase to main please, and move this file to tests/e2e/multicard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name was also wrong....
Yikun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please do rebase
- Please update commits msg to show step by step test include
After this PR merged
- Please do refactor to inhert w8a8
- Please add doc
| from vllm_ascend.utils import dispose_tensor | ||
| from vllm.config import get_current_vllm_config | ||
|
|
||
| VLLM_ENABLE_MC2: bool = envs_ascend.VLLM_ENABLE_MC2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, this already remove in #1229
What this PR does / why we need it?
Does this PR introduce any user-facing change?
How was this patch tested?