-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Model]: when support MiniCPM3ForCausalLM MiniCPM3-4B model #8232
Comments
The model will be added to vLLM very soon, thanks @SUDA-HLT-ywfang! |
When deploying MiniCPM3-4B using vllm, the inference speed is much slower than models of the same size, such as Qwen1.5-4B. Can you check this? |
Can you run the HuggingFace implementation of those models and see if you observe the same differences? |
It's difficult to narrow down the issue without providing more information. |
huggingface implementation on A800 without flash attention
It seems that minicpm3 is indeed much slower |
The model to consider.
https://huggingface.co/openbmb/MiniCPM3-4B
The closest model vllm already supports.
No response
What's your difficulty of supporting the model you want?
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: