Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: when support MiniCPM3ForCausalLM MiniCPM3-4B model #8232

Closed
1 task done
ML-GCN opened this issue Sep 6, 2024 · 5 comments · Fixed by #8297
Closed
1 task done

[New Model]: when support MiniCPM3ForCausalLM MiniCPM3-4B model #8232

ML-GCN opened this issue Sep 6, 2024 · 5 comments · Fixed by #8297
Labels
new model Requests to new models

Comments

@ML-GCN
Copy link

ML-GCN commented Sep 6, 2024

The model to consider.

https://huggingface.co/openbmb/MiniCPM3-4B

The closest model vllm already supports.

No response

What's your difficulty of supporting the model you want?

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@ML-GCN ML-GCN added the new model Requests to new models label Sep 6, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Sep 14, 2024

The model will be added to vLLM very soon, thanks @SUDA-HLT-ywfang!

@Mencuis
Copy link

Mencuis commented Oct 10, 2024

When deploying MiniCPM3-4B using vllm, the inference speed is much slower than models of the same size, such as Qwen1.5-4B. Can you check this?

@DarkLight1337
Copy link
Member

Can you run the HuggingFace implementation of those models and see if you observe the same differences?

@DarkLight1337
Copy link
Member

It's difficult to narrow down the issue without providing more information.

@Mencuis
Copy link

Mencuis commented Oct 14, 2024

Can you run the HuggingFace implementation of those models and see if you observe the same differences?

huggingface implementation on A800 without flash attention
new tokens 128
inference time

  • MiniCPM3-4B 11.6 s
  • Qwen1.5-4B 6.3 s

It seems that minicpm3 is indeed much slower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants