Skip to content

Conversation

@alcoftTAO
Copy link

New template for Qwen3-VL!
This new template allows the usage of tools/functions, as well as executing tools/functions after answering the user's prompt/question (an issue previous Qwen models had because of their chat template).

Also added some quality-of-life improvements, such as extra_template_arguments where derivatives of the Llava15ChatHandler class can add/overwrite arguments to the Jinja2 template.

Also added a thinking_budget parameter in the Qwen3VLChatHandler class for future updates, the model right now seems to ignore it.

Removed the use_thinking_prompt parameter because the new template doesn't need them; works with both Qwen3-VL-Instruct and Qwen3-VL-Thinking.

I only tested it with Qwen3-VL-2B (both Thinking and Instruct versions) and seems to work fine.


Previously, the Thinking version of the models didn't generate the <think> XML tag because it was written in the template, so I fixed that. Now the Thinking models generate the <think> tag.

@alcoftTAO
Copy link
Author

@JamePeng Please let me know if this project supports video inference for multimodal models, since I'd also like to implement it in the template if supported.

@JamePeng
Copy link
Owner

JamePeng commented Nov 6, 2025

@JamePeng Please let me know if this project supports video inference for multimodal models, since I'd also like to implement it in the template if supported.

You can follow the progress of this implementation; I will adapt it when merging it into the main project: ngxson/llama.cpp#32

Previously, the Thinking version of the models didn't generate the <think> XML tag because it was written in the template, so I fixed that. Now the Thinking models generate the <think> tag.

The chat_template in the Qwen3VL-thinking series contains the tag. It's best to keep it consistent with the official version. Disabling it won't affect usage, but without forced thinking, there's a possibility that some users won't think at all.

See: https://huggingface.co/Qwen/Qwen3-VL-8B-Thinking/blob/main/chat_template.json

@alcoftTAO
Copy link
Author

The chat_template in the Qwen3VL-thinking series contains the tag. It's best to keep it consistent with the official version. Disabling it won't affect usage, but without forced thinking, there's a possibility that some users won't think at all.

I tested it and had no issues, but I'll add the tags back if you'd like!

@alcoftTAO
Copy link
Author

Done, I added the <think> tag back.

@JamePeng
Copy link
Owner

JamePeng commented Nov 8, 2025

I see that thinking_budget hasn't been implemented yet. Should we not pass it as a parameter for now? That's about it.

"{%- for content in message.content -%}"
"{%- if 'image_url' in content -%}"
"{%- set image_count.value = image_count.value + 1 -%}"
"{%- if add_vision_id -%}"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's no way to pass the add_vision_id tag? Without a counter, multi-image recognition can easily lead to misinterpretations.

@JamePeng
Copy link
Owner

JamePeng commented Nov 8, 2025

LGTM

@JamePeng JamePeng merged commit 17ba24f into JamePeng:main Nov 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants