Add minicpm-o and qwen2-vl to the list of supported multimodal models. #1904

kseyhan · 2025-01-24T19:30:58Z

Support for the Qwen2-VL and MiniCPM-o models would be nice. They already have have been merged into the llava subproject of llama.cpp.

lelefontaa · 2025-02-04T09:47:38Z

+1

kseyhan · 2025-02-05T21:55:56Z

hmm, just tested again. maybe was me or i did pull an outdated llama or what last time. minicpm-o seems to work with the "minicpm-v-2.6" chat handler.

la1ty · 2025-02-09T07:01:55Z

Yes, minicpm-o-2.6 works with the minicpm-v-2.6 chat handler. But Qwen2-VL seems does not work with any existing chat handler.

I try to use the example chat template from llama.cpp but it still generate random characters...

samkoesnadi · 2025-02-09T08:37:27Z

Yes, minicpm-o-2.6 works with the minicpm-v-2.6 chat handler. But Qwen2-VL seems does not work with any existing chat handler.

I try to use the example chat template from llama.cpp but it still generate random characters...

This is interesting. Could you give us the GGUF model urls you are using?

la1ty · 2025-02-09T08:55:49Z

@samkoesnadi I downloaded them from HuggingFace. Hope you have some good news.

kseyhan · 2025-02-09T17:12:44Z

@samkoesnadi i tried my luck with Qwen2-VL-7B-Instruct-GGUF and tried almost every registered chat handler that includes a <|im_start|> and <|im_end|> token in the template and got the same results as @la1ty with random words in random languages as reply.

i also tried to implment the chat template myself but unfortunately did fail since i didnt realy understand the jinja template:

{
"chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
}

samkoesnadi · 2025-02-09T17:14:21Z

@samkoesnadi i tried my luck with Qwen2-VL-7B-Instruct-GGUF and tried almost every registered chat handler that includes a <|im_start|> and <|im_end|> token in the template and got the same results as @la1ty with random words in random languages as reply.

i also tried to implment the chat template myself but unfortunately did fail since i didnt realy understand the jinja template:
{
"chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}"
}
the template expects <|vision_start|><|image_pad|><|vision_end|> (which is verry unique for this model and not registered in any chat handler so far) and i didnt realy see where the base64 encoded string / image_url should go in this template to be true.

@la1ty could you guys try the 2B and see if it works? That's the one I tested...

kseyhan · 2025-02-09T17:19:54Z

@samkoesnadi which chat handler did you use if i may ask? the exact url to the model you used there would be usefull aswell.

la1ty · 2025-02-10T02:19:11Z

@kseyhan Yes, that's what I exactly experienced.

And I don't know if I make errors in compiling, but I found that text responses generating by Qwen2-VL-7b with llama-cpp-python v0.3.7 are mostly nonsense, which is not identical to the behavior in llama-cli.exe. Maybe I need to recompile it with the latest version of llama.cpp.

@samkoesnadi Yes it works with llama-cli.exe and llama-qwen2vl-cli.exe in llama.cpp, though llama-qwen2vl-cli.exe has an encoding problem for non-ascii characters on Windows platform?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add minicpm-o and qwen2-vl to the list of supported multimodal models. #1904

Add minicpm-o and qwen2-vl to the list of supported multimodal models. #1904

kseyhan commented Jan 24, 2025

lelefontaa commented Feb 4, 2025

kseyhan commented Feb 5, 2025

la1ty commented Feb 9, 2025 •

edited

Loading

samkoesnadi commented Feb 9, 2025

la1ty commented Feb 9, 2025

kseyhan commented Feb 9, 2025

samkoesnadi commented Feb 9, 2025

kseyhan commented Feb 9, 2025

la1ty commented Feb 10, 2025 •

edited

Loading

Add minicpm-o and qwen2-vl to the list of supported multimodal models. #1904

Add minicpm-o and qwen2-vl to the list of supported multimodal models. #1904

Comments

kseyhan commented Jan 24, 2025

lelefontaa commented Feb 4, 2025

kseyhan commented Feb 5, 2025

la1ty commented Feb 9, 2025 • edited Loading

samkoesnadi commented Feb 9, 2025

la1ty commented Feb 9, 2025

kseyhan commented Feb 9, 2025

samkoesnadi commented Feb 9, 2025

kseyhan commented Feb 9, 2025

la1ty commented Feb 10, 2025 • edited Loading

la1ty commented Feb 9, 2025 •

edited

Loading

la1ty commented Feb 10, 2025 •

edited

Loading