Feature Request: Qwen 2.5 VL #11483

bold84 · 2025-01-29T11:36:22Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Is anybody implementing this?

If not, I may give it a go. But it will take some time as I am new to the source side of llama.cpp/ggml.

Motivation

Well, it's not currently working. :-)

Possible Implementation

Based on the existing Qwen 2 VL implementation.

HimariO · 2025-01-29T13:37:18Z

I'm currently looking into Transformers' Qwen2.5VL implementation and waiting for the paper to drop so I can better assess the differences between Qwen2VL and Qwen2.5VL. 👀

3unnycheung · 2025-01-29T14:15:10Z

cool

samkoesnadi · 2025-01-29T18:05:34Z

I support this!

Shyryp · 2025-02-02T14:11:45Z

Our world definitely needs this!

peter-ch · 2025-02-13T13:52:43Z

Any progress on this? Who added support for Qwen 2 VL?

pszemraj · 2025-02-20T22:01:30Z

qwen2.5-vl report is up! https://huggingface.co/papers/2502.13923

edit: official codebase here: https://github.com/QwenLM/Qwen2.5-VL

vladislavdonchev · 2025-02-22T17:23:36Z

I can start working on this if no one else is already.

vladislavdonchev · 2025-02-22T21:05:47Z

OK then!

First order of business would be to build the GGUF file(s). Seems there is an issue with that and the latest official Transformers:

python convert_hf_to_gguf.py .\build\bin\Release\Qwen2.5-VL-7B-Instruct\
INFO:hf-to-gguf:Loading model: Qwen2.5-VL-7B-Instruct
ERROR:hf-to-gguf:Model Qwen2_5_VLForConditionalGeneration is not supported

This is pretty hot:
huggingface/transformers#36292
QwenLM/Qwen2.5-VL#723

Appears a temporary workaround would be to use the old Qwen2 templates. People are reporting this works, so I'll post an update in a bit.

vladislavdonchev · 2025-02-22T22:04:18Z

Right, so this one is a bit of a rabbit hole...

I. Reverting the Qwen2.5 config files to:

"processor_class": "Qwen2VLProcessor"

and

  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]

Produces a (seemingly) working model! We've started testing and quantizing it here:
https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit:
#10896

For more information refer to:
#11322

The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place:
#11902

So, it is now up to us to prove that everything is working properly.

I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.

vladislavdonchev · 2025-02-23T11:03:21Z

UPDATE: A few 4-bit quants have been uploaded, including two that support online auto-repacking.

The latest main looks stable with Vulkan CLIP and any model thrown at it so far. Some preliminary insights:

1200x1200 is the maximum you can encode with 16GB of VRAM. clip.cpp does not seem to support multi-GPU Vulkan yet.
A 4060Ti-class GPU delivers 20-30 t/s with the Q8_0 and double that on Q4 @ 16-32K context.
Batching (multiple images) in a single cli call seems to be working fine:
llama-qwen2vl-cli--ctx-size 16000 -n 16000 -m ~/gguf/Qwen2.5-VL-7B-Instruct-Q4_0.gguf --mmproj ~/gguf/mmproj-Qwen2.5-VL-7B-Instruct-f32.gguf --n_gpu_layers 9999 -p "Describe the image in detail. Extract all textual information from it. Output as detailed JSON." -p "Analyze the image." --image ~/Pictures/test_small.png --image ~/Pictures/test_small.png

Output quality looks very promising! We'll release all of the benchmark code when ready, so the process can be streamlined for other models.

hvico · 2025-02-24T02:17:01Z

Hi! Excelent news, thank you very much for this!

I was able to run the model by using code from git main on a 4 x Radeon 7900 XTX 24 GB workstation, but using Clip on CPU. I tried to enable Vulkan acceleration for Clip by uncommenting the lines on clip.cpp under examples, but in that case I get OOM. I tried this with models FP16, Q4K_M and IQ4_XS. Specifying the cli to just use one Vulkan device does not help on the OOM / Clip GPU issue either.

vladislavdonchev · 2025-02-24T05:19:46Z

Hi! Excelent news, thank you very much for this!

I was able to run the model by using code from git main on a 4 x Radeon 7900 XTX 24 GB workstation, but using Clip on CPU. I tried to enable Vulkan acceleration for Clip by uncommenting the lines on clip.cpp under examples, but in that case I get OOM. I tried this with models FP16, Q4K_M and IQ4_XS. Specifying the cli to just use one Vulkan device does not help on the OOM / Clip GPU issue either.

Hi, could you please confirm what the resolution of your input images is?

EDIT: As per Qwen2.5 docs:
min_pixels = 256x28x28
max_pixels = 1280x28x28

A RTFM moment for me...

hvico · 2025-02-24T12:52:14Z

Hi! Excelent news, thank you very much for this!
I was able to run the model by using code from git main on a 4 x Radeon 7900 XTX 24 GB workstation, but using Clip on CPU. I tried to enable Vulkan acceleration for Clip by uncommenting the lines on clip.cpp under examples, but in that case I get OOM. I tried this with models FP16, Q4K_M and IQ4_XS. Specifying the cli to just use one Vulkan device does not help on the OOM / Clip GPU issue either.

Hi, could you please confirm what the resolution of your input images is? With 24G VRAM, you can expect an OOM with images >1400x1400 pixels, so you need to make sure the files are pre-processed correctly.

Thanks.

My image was 1475x1062. I was able to run inference successfuly using a 1077x671 sample, without OOM. Would it be possible to run Clip and VL on separate GPUs? Thanks again.

zrrraa · 2025-02-25T13:31:57Z

Right, so this one is a bit of a rabbit hole...

I. Reverting the Qwen2.5 config files to:

"processor_class": "Qwen2VLProcessor"

and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896

For more information refer to: #11322

The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902

So, it is now up to us to prove that everything is working properly.

I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.

Thank you very much for your research and sharing! I would like to ask how to get mmproj from Qwen2.5-VL model? The original qwen2_vl_surgery.py used for Qwen2-VL doesn't seem to work, could you share your method? Thank you very much!

vladislavdonchev · 2025-02-25T16:23:49Z

Right, so this one is a bit of a rabbit hole...
I. Reverting the Qwen2.5 config files to:
"processor_class": "Qwen2VLProcessor"
and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896
For more information refer to: #11322
The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902
So, it is now up to us to prove that everything is working properly.
I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.
Thank you very much for your research and sharing! I would like to ask how to get mmproj from Qwen2.5-VL model? The original qwen2_vl_surgery.py used for Qwen2-VL doesn't seem to work, could you share your method? Thank you very much!

Get it from our HF:
https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF

ChmHsm · 2025-02-27T09:43:17Z

Thank you for the effort, a lot of people really need this.

Any updates on the progress? Will this still take a few days? or is it more like a few weeks or months?

Thanks a lot again, we appreciate you guys a lot!.

samkoesnadi · 2025-02-27T14:07:54Z

@vladislavdonchev Great work! Have you done the 3B version? I can also do it myself if you provide the conversion script :)

vladislavdonchev · 2025-02-27T14:31:36Z

@vladislavdonchev Great work! Have you done the 3B version? I can also do it myself if you provide the conversion script :)

Working on it as we speak, along with a quantization tool:

https://github.com/Independent-AI-Labs/local-super-agents/tree/feat/additional-output-formats/quantbench

vladislavdonchev · 2025-02-28T22:21:49Z

UPDATE:

Opened a draft PR here: #12119

Long story short, I'll need some help debugging the vision models and llama-qwen2vl-cli as we're unable to produce anything reliably.

In addition, this still isn't resolved:
#11322

I've also asked the Qwen folks for help:
QwenLM/Qwen2.5-VL#869

ChmHsm · 2025-02-28T22:49:08Z

Thanks @vladislavdonchev for the effort and the update.

I took a look at the issue you opened with the qwen team, is it only affecting the 3B model? Can we expect at least progress to continue with 7b?

Thank you!

vladislavdonchev · 2025-02-28T23:00:01Z

Thanks @vladislavdonchev for the effort and the update.

I took a look at the issue you opened with the qwen team, is it only affecting the 3B model? Can we expect at least progress to continue with 7b?

Thank you!

Unfortunately, we're unable to reliably produce a working vision model from either 7B or 3B. I am not sure how the one in the repo was exported, but it seems to be working, so it's either some weird coincidence or a mistake. I've verified the LM part, including in quants and it also appears to match what you'd expect from Qwen2.5 (parameters in .gguf seem correct, responses are OK).

David33706 · 2025-03-01T17:07:50Z

Right, so this one is a bit of a rabbit hole...

I. Reverting the Qwen2.5 config files to:

"processor_class": "Qwen2VLProcessor"

and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896

For more information refer to: #11322

The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902

So, it is now up to us to prove that everything is working properly.

I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.

I am getting the following error while trying to use Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf on Apple Silicon:

./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image."

key general.description not found in file
libc++abi: terminating due to uncaught exception of type std::runtime_error: Missing required key: general.description
zsh: abort      ./llama-qwen2vl-cli -m  --mmproj  --n_gpu_layers 0 --image  --image  -p

Could somebody please help out?

tomjpalamattam · 2025-03-02T23:22:59Z

Right, so this one is a bit of a rabbit hole...
I. Reverting the Qwen2.5 config files to:
"processor_class": "Qwen2VLProcessor"
and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896
For more information refer to: #11322
The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902
So, it is now up to us to prove that everything is working properly.
I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.
Right, so this one is a bit of a rabbit hole...
I. Reverting the Qwen2.5 config files to:
"processor_class": "Qwen2VLProcessor"
and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896
For more information refer to: #11322
The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902
So, it is now up to us to prove that everything is working properly.
I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.
I am getting the following error while trying to use Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf on Apple Silicon:

./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image."
key general.description not found in file
libc++abi: terminating due to uncaught exception of type std::runtime_error: Missing required key: general.description
zsh: abort      ./llama-qwen2vl-cli -m  --mmproj  --n_gpu_layers 0 --image  --image  -p
Could somebody please help out?

Did you figure this out?

David33706 · 2025-03-03T10:43:39Z

Right, so this one is a bit of a rabbit hole...
I. Reverting the Qwen2.5 config files to:
"processor_class": "Qwen2VLProcessor"
and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896
For more information refer to: #11322
The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902
So, it is now up to us to prove that everything is working properly.
I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.
Right, so this one is a bit of a rabbit hole...
I. Reverting the Qwen2.5 config files to:
"processor_class": "Qwen2VLProcessor"
and
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ]
Produces a (seemingly) working model! We've started testing and quantizing it here: https://huggingface.co/IAILabs/Qwen2.5-VL-7b-Instruct-GGUF/tree/main

II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: #10896
For more information refer to: #11322
The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: #11902
So, it is now up to us to prove that everything is working properly.
I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on.
I am getting the following error while trying to use Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf on Apple Silicon:
./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image."
key general.description not found in file
libc++abi: terminating due to uncaught exception of type std::runtime_error: Missing required key: general.description
zsh: abort      ./llama-qwen2vl-cli -m  --mmproj  --n_gpu_layers 0 --image  --image  -p
Could somebody please help out?
Did you figure this out?

Nope

vladislavdonchev · 2025-03-03T10:51:09Z

Please stop spamming this thread. Qwen2.5 is still a WIP!

Regarding the issue above:
./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image."
You cannot use the Language Model as a Vision Model (mmproj - in your command you are specifying the same thing twice).

Please wait until the implementation has been finalized.

sinkingsugar · 2025-03-07T00:32:35Z

Please stop spamming this thread. Qwen2.5 is still a WIP!

Regarding the issue above: ./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image." You cannot use the Language Model as a Vision Model (mmproj - in your command you are specifying the same thing twice).

Please wait until the implementation has been finalized. Most up-to-date news here: https://huggingface.co/IAILabs/Qwen2.5-VL-7B-Instruct-GGUF-WIP

hmm fyi:
404
Sorry, we can't find the page you are looking for.

vladislavdonchev · 2025-03-07T09:14:14Z

Please stop spamming this thread. Qwen2.5 is still a WIP!
Regarding the issue above: ./llama-qwen2vl-cli -m "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --mmproj "Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf" --n_gpu_layers 0 --image "wilma-7_oval.jpg" --image "wilma-7_oval.jpg" -p "Describe the image." You cannot use the Language Model as a Vision Model (mmproj - in your command you are specifying the same thing twice).
Please wait until the implementation has been finalized. Most up-to-date news here: https://huggingface.co/IAILabs/Qwen2.5-VL-7B-Instruct-GGUF-WIP

hmm fyi: 404 Sorry, we can't find the page you are looking for.

I've temporarily disabled the page, as too many people are trying to run the models with the incorrect versions of llama.cpp

There will be an update soon.

ehartford · 2025-03-12T18:09:58Z

I am available to help with testing - anything you need.

euberdeveloper · 2025-03-13T13:10:55Z

any news?

HimariO · 2025-03-16T10:11:16Z

I've just completed the first working implementation of Qwen2.5VL on top of my previous Qwen2VL work, incorporating new components such as window attention and GLU MLP in the vision encoder.

Before I refine the code and request a PR review, anyone interested in testing it and providing feedback can find the latest version of llama.cpp Qwen2.5VL here.

Instructions for building llama-qwen2vl-cli and model conversion are available in the draft PR. Alternatively, you can try the pre-converted 3B model available on Hugging Face.

Some Results

Image	Savings
	The image shows a bustling urban street scene. On the left side, there is a prominent statue of a man standing on a pedestal. The statue, dressed in a long coat and holding a document, is a focal point of the scene. The statue appears to be made of bronze and is situated in front of a building with classical architectural elements, such as columns and a cornice. In the background, there are several tall buildings, including a prominent skyscraper. The buildings are adorned with American flags, indicating a sense of national pride and possibly a location in the United States, such as New York City. The flags are flying high from the rooftops and are also visible from various windows and balconies on the buildings. The street is busy with pedestrians and vehicles. People are walking along the sidewalk, and there are streetlights and street signs, indicating a well-maintained urban area. There are also food trucks parked along the street, suggesting a lively and commercial area. The overall atmosphere of the image is vibrant and dynamic, typical of a busy downtown area in a major city. The combination of historical and contemporary elements, such as the statue and modern skyscrapers, creates a rich tapestry of urban life.
	The image depicts a serene beach scene featuring a person and a dog. The person, a woman, is sitting on the sandy beach facing the ocean. She is wearing a plaid shirt and appears to be smiling warmly. The dog, a light-colored Labrador Retriever, is sitting on the sand facing the woman. The dog's front paws are extended towards the woman, as if it is reaching out or offering something. The background shows a calm ocean with gentle waves crashing onto the shore, and the sky is clear with a soft light suggesting either early morning or late afternoon. The overall atmosphere of the image is peaceful and intimate, capturing a moment of connection between the person and the dog.

All caption are created with following cli command

./llama-qwen2vl-cli -m qwen25vl-3b-instruct.gguf --mmproj qwen25vl-vision.gguf -p "Describe this image." --image demo.jpg

smellslikeml · 2025-03-16T17:22:59Z

I've just completed the first working implementation of Qwen2.5VL on top of my previous Qwen2VL work, incorporating new components such as window attention and GLU MLP in the vision encoder.

Before I refine the code and request a PR review, anyone interested in testing it and providing feedback can find the latest version of llama.cpp Qwen2.5VL here.

Hi @HimariO

Thanks for taking up the challenge!
I did try testing the conversion flow on a fine-tune of the 3B Qwen2.5-VL-Instruct here: https://huggingface.co/remyxai/SpaceQwen2.5-VL-3B-Instruct

Below is a screenshot of the results

Then I tried using the [base model]https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) w/o fine-tuning and found similar output.

Hope this helps, rooting for you!

HimariO · 2025-03-16T18:19:36Z

Hi @smellslikeml

The results from the fine-tuned model you provided seem reasonable.(I'm using sample image from VQASynth README page for testing)

Image	Output
	Based on the image provided, the man in the blue shirt appears to be standing and working near the wooden pallet with boxes on the floor. Since he is in an upright position and the wooden pallet is on the floor, the man seems to be at a slightly higher elevation compared to the wooden pallet. However, the exact height difference may vary depending on the specific dimensions of the man and the wooden pallet, as well as the perspective of the image. From a general visual interpretation, it seems the man is standing at a height greater than the wooden pallet with boxes on the floor.

Here is the convert/infernce process I go through:

PYTHONPATH=$PYTHONPATH:$(pwd)/gguf-py python3 examples/llava/qwen2_vl_surgery.py "remyxai/SpaceQwen2.5-VL-3B-Instruct" --data_type fp32 --model_type "qwen2.5vl"

python3 convert_hf_to_gguf.py /home/username/.cache/huggingface/hub/models--remyxai--SpaceQwen2.5-VL-3B-Instruct/snapshots/0214012e4f48f6adc263d9372984e8aa164v6a21 --outtype f16

./llama-qwen2vl-cli  -m SpaceQwen25-VL-3B-Instruct-F16.gguf --mmproj remyxai-spaceqwen2.5-vl-3b-instruct-vision.gguf -p "Does the man in blue shirt working have a greater height compared to the wooden pallet with boxes on floor?" --image ~/warehouse_sample_3.jpeg --threads 24 -ngl 99

NOTE: Remember to do a clean rebuild of llama-qwen2vl-cli using the llama.cpp branch from here.

smellslikeml · 2025-03-16T19:25:22Z

Hi @smellslikeml

The results from the fine-tuned model you provided seem reasonable.(I'm using sample image from VQASynth README page for testing)
Image Output
Based on the image provided, the man in the blue shirt appears to be standing and working near the wooden pallet with boxes on the floor. Since he is in an upright position and the wooden pallet is on the floor, the man seems to be at a slightly higher elevation compared to the wooden pallet. However, the exact height difference may vary depending on the specific dimensions of the man and the wooden pallet, as well as the perspective of the image.
From a general visual interpretation, it seems the man is standing at a height greater than the wooden pallet with boxes on the floor.

Confirming your conversion worked using the FP32 flags 🎊
Thanks again!

jfernandrezj · 2025-03-19T17:34:36Z

What is the right way to get full GGUF of a LoRA trained for Qwen2.5-VL?
Is there a gist / script example? Is it just matter of merging the LoRA and running the surgery?

HimariO · 2025-03-20T14:21:06Z

@jfernandrezj Merging the LoRA parameter into the base model will be the way to do it, since neither qwen2_vl_surgery.py nor clip.cpp recognize LoRA parameter names.

jfernandrezj · 2025-03-20T16:53:57Z

Thank you very much @HimariO for your response. I merged the base and the lora, and ran the surgery to get a ~2.5 gb gguf.

Hrayo712 · 2025-03-20T20:47:45Z

@HimariO Thanks for the amazing work! I will try this shortly.

Can you comment on the status for the 7B variant ? Also, have you tested any quantization on the model as well ?

green-s · 2025-03-20T23:44:57Z

@HimariO Thanks for the amazing work! I will try this shortly.

Can you comment on the status for the 7B variant ? Also, have you tested any quantization on the model as well ?

I haven't tried 7B but I uploaded a conversion/quant of 72B to HuggingFace here. Only a couple quant levels right now since they take a while to upload but I can make others if you want. It seems to work well.

Hrayo712 · 2025-03-20T23:51:55Z

Thanks for the prompt response @green-s !

Would you foresee any issues with applying the same procedure you followed for the 72B, but for the 7B (e.g., on 4 bits) ?

Would be great if you could share pointers for running and enabling the conversion quantization

green-s · 2025-03-20T23:57:06Z

Thanks for the prompt response @green-s !

Would you foresee any issues with applying the same procedure you followed for the 72B, but for the 7B (e.g., on 4 bits) ?

Would be great if you could share pointers for running and enabling the conversion quantization

I used the exact same commands provided by @HimariO here. I'm not sure if I was supposed to do something differently but it seemed to work fine, so I presume 7B should too. I could convert 7B also if you like.

Hrayo712 · 2025-03-20T23:58:33Z

Thanks for the prompt response @green-s !
Would you foresee any issues with applying the same procedure you followed for the 72B, but for the 7B (e.g., on 4 bits) ?
Would be great if you could share pointers for running and enabling the conversion quantization

I used the exact same commands provided by @HimariO here. I'm not sure if I was supposed to do something differently but it seemed to work fine, so I presume 7B should too. I could convert 7B also if you like.

That would be amazing! Thanks :)

green-s · 2025-03-21T00:28:10Z

@Hrayo712 Just uploaded Q4_K_M here. Uploading a few more variants now.

RicksThread · 2025-03-21T14:09:44Z

Is it possible to make sure the model gets loaded only once? An implementation similar to llama-server, would it improve inference time?

bold84 added the enhancement New feature or request label Jan 29, 2025

bold84 mentioned this issue Jan 30, 2025

Feature Request: Support for Qwen2.5-VL #11524

Closed

4 tasks

vladislavdonchev mentioned this issue Feb 22, 2025

add Qwen2-VL/Qwen2.5-VL ollama/ollama#6564

Open

hvico mentioned this issue Feb 28, 2025

The server does not to run inference and responds immediately with null (sometimes) gpustack/llama-box#40

Closed

vladislavdonchev mentioned this issue Feb 28, 2025

请问2.5-VL可以集成到ollama与open-webui中使用吗 QwenLM/Qwen2.5-VL#863

Open

HimariO mentioned this issue Mar 15, 2025

Add Qwen2.5VL support #12402

Draft

4 tasks

Feature Request: Qwen 2.5 VL #11483

Feature Request: Qwen 2.5 VL #11483

Comments

bold84 commented Jan 29, 2025 • edited Loading

Prerequisites

Feature Description

Motivation

Possible Implementation

HimariO commented Jan 29, 2025

3unnycheung commented Jan 29, 2025

samkoesnadi commented Jan 29, 2025

Shyryp commented Feb 2, 2025

peter-ch commented Feb 13, 2025

pszemraj commented Feb 20, 2025 • edited Loading

vladislavdonchev commented Feb 22, 2025

vladislavdonchev commented Feb 22, 2025 • edited Loading

vladislavdonchev commented Feb 22, 2025 • edited Loading

vladislavdonchev commented Feb 23, 2025 • edited Loading

hvico commented Feb 24, 2025 • edited Loading

vladislavdonchev commented Feb 24, 2025 • edited Loading

hvico commented Feb 24, 2025

zrrraa commented Feb 25, 2025

vladislavdonchev commented Feb 25, 2025

ChmHsm commented Feb 27, 2025

samkoesnadi commented Feb 27, 2025

vladislavdonchev commented Feb 27, 2025

vladislavdonchev commented Feb 28, 2025

ChmHsm commented Feb 28, 2025 • edited Loading

vladislavdonchev commented Feb 28, 2025 • edited Loading

David33706 commented Mar 1, 2025

tomjpalamattam commented Mar 2, 2025

David33706 commented Mar 3, 2025

vladislavdonchev commented Mar 3, 2025 • edited Loading

sinkingsugar commented Mar 7, 2025

vladislavdonchev commented Mar 7, 2025

ehartford commented Mar 12, 2025

euberdeveloper commented Mar 13, 2025

HimariO commented Mar 16, 2025 • edited Loading

Some Results

smellslikeml commented Mar 16, 2025

HimariO commented Mar 16, 2025

smellslikeml commented Mar 16, 2025

jfernandrezj commented Mar 19, 2025

HimariO commented Mar 20, 2025

jfernandrezj commented Mar 20, 2025 • edited Loading

Hrayo712 commented Mar 20, 2025

green-s commented Mar 20, 2025

Hrayo712 commented Mar 20, 2025

green-s commented Mar 20, 2025

Hrayo712 commented Mar 20, 2025

green-s commented Mar 21, 2025

RicksThread commented Mar 21, 2025 • edited Loading

bold84 commented Jan 29, 2025 •

edited

Loading

pszemraj commented Feb 20, 2025 •

edited

Loading

vladislavdonchev commented Feb 22, 2025 •

edited

Loading

vladislavdonchev commented Feb 22, 2025 •

edited

Loading

vladislavdonchev commented Feb 23, 2025 •

edited

Loading

hvico commented Feb 24, 2025 •

edited

Loading

vladislavdonchev commented Feb 24, 2025 •

edited

Loading

ChmHsm commented Feb 28, 2025 •

edited

Loading

vladislavdonchev commented Feb 28, 2025 •

edited

Loading

vladislavdonchev commented Mar 3, 2025 •

edited

Loading

HimariO commented Mar 16, 2025 •

edited

Loading

jfernandrezj commented Mar 20, 2025 •

edited

Loading

RicksThread commented Mar 21, 2025 •

edited

Loading