Add Qwen2.5VL support #12402

HimariO · 2025-03-15T19:16:30Z

Original issue: #11483

Changes

Add new gguf key for clip model to support
- GLU MLP,
- window attention,
- RMS norm
Updated clip.cpp vision model to incorporate these new components.
Modified qwen2_vl_surgery.py and convert_hf_to_gguf.py to support the Qwen2.5VL model.

Model Conversion

The only change in the conversion process compared to Qwen2VL is the addition of the model_type parameter when creating the vision encoder GGUF file. (For the rest of the process and how to build llama-qwen2vl-cli, refer to #10361.)

PYTHONPATH=$PYTHONPATH:$(pwd)/gguf-py python3 examples/llava/qwen2_vl_surgery.py "/path/to/model" --data_type fp16 --model_type "qwen2.5vl"

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

HimariO added 6 commits March 16, 2025 02:29

implment vision model architecture, gguf convertor

6a8bae0

handle window attention inputs

cfc78c8

add support for Qwen2_5_VLForConditionalGeneration

8d69c2f

add debug utils

6047054

fix few incorrect tensor memory layout

50d0b69

move position id remap out of ggml to avoid int32 cuda operations

bc4163b

github-actions bot added examples python python script changes labels Mar 15, 2025

HimariO mentioned this pull request Mar 16, 2025

Feature Request: Qwen 2.5 VL #11483

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen2.5VL support #12402

Add Qwen2.5VL support #12402

HimariO commented Mar 15, 2025 •

edited

Loading

Add Qwen2.5VL support #12402

Are you sure you want to change the base?

Add Qwen2.5VL support #12402

Conversation

HimariO commented Mar 15, 2025 • edited Loading

Changes

Model Conversion

HimariO commented Mar 15, 2025 •

edited

Loading