[New Model]: when support MiniCPM3ForCausalLM MiniCPM3-4B model #8232

ML-GCN · 2024-09-06T10:03:16Z

No response

No response

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

DarkLight1337 · 2024-09-14T11:33:36Z

The model will be added to vLLM very soon, thanks @SUDA-HLT-ywfang!

Mencuis · 2024-10-10T03:31:17Z

When deploying MiniCPM3-4B using vllm, the inference speed is much slower than models of the same size, such as Qwen1.5-4B. Can you check this?

DarkLight1337 · 2024-10-10T03:57:03Z

Can you run the HuggingFace implementation of those models and see if you observe the same differences?

DarkLight1337 · 2024-10-10T03:57:24Z

It's difficult to narrow down the issue without providing more information.

Mencuis · 2024-10-14T03:27:30Z

Can you run the HuggingFace implementation of those models and see if you observe the same differences?

huggingface implementation on A800 without flash attention
new tokens 128
inference time

It seems that minicpm3 is indeed much slower

ML-GCN added the new model Requests to new models label Sep 6, 2024

DarkLight1337 mentioned this issue Sep 14, 2024

[Model] support minicpm3 #8297

Merged

DarkLight1337 closed this as completed in #8297 Sep 14, 2024

Provide feedback