-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: DeepSeek-Coder-V2-Instruct-AWQ assert self.quant_method is not None #7494
Comments
AWQ is not yet supported for this MOE Model |
Is there any plans to support it in the near future? |
cc @mgoin @robertgshaw2-neuralmagic |
yes we are almost done with it. I hope in v0.5.6 we will have it |
Once it is supported, can we deploy it on the L40? Currently, we only have L40 available. |
Any chance it'd be available sooner than that on a dev branch? |
Hi @robertgshaw2-neuralmagic, sorry to bother you, but I just wanted to check if you can point us toward the branch or if there are any updates |
jup, support of https://huggingface.co/casperhansen/deepseek-coder-v2-instruct-awq/tree/main |
@robertgshaw2-neuralmagic Hi,now that v0.6.0 has been released, it seems that I didn't see any PR related to AWQ. Can you please let me know approximately how long we would have to wait for it? |
For deepseek-v2 modeling, there are two points todo:
|
I'm facing the same problem, any plan to support such MoE Quant models? |
I'm currently running the awq quantized DeepSeek-V2 model and encountered the same issue. I found that vllm 0.6.3 now supports the AWQ fuse MoE operator, so your first point is resolved. However, what do you mean by the second point? Because I'm encountering the following issue:
|
i used vllm v0.6.3.post1,this problem still exist |
Could you please provibe your runnning script? |
i used vllm 0.6.6.post2.dev58+g07064cb1.cu124,this problem still exist
|
If I remember correctly, try: python api_server.py --model /llm-model/deepseek-ai/deepseek-coder-v2-instruct-awq --gpu-memory-utilization 0.85 --max-model-len 512 --tensor-parallel-size 4 |
Thank you. I followed your instructions and was able to load the model correctly, but after loading the model, I encountered another error. My machine is equipped with 4 x A40 GPUs, with the following software versions:
|
Where is this model from? It seems DS doesn't provide AWQ version |
I downloaded the model from https://modelscope.cn/models/cycloneboy/deepseek-coder-v2-instruct-awq. If AWQ is not supported, how should I quantize and run deepseekv2 now? Could you give me some advice? Thank you very much. |
Your current environment
The output of `python collect_env.py`
The text was updated successfully, but these errors were encountered: