-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval bug: Qwen2-VL Hallucinates image content on Vulkan backend #10843
Comments
Could you do a quick test and see if it works with an F16 vision projector: .\build\bin\Release\llama-quantize.exe .\models\mmproj-Qwen2-VL-7B-Instruct-f32.gguf .\models\mmproj-Qwen2-VL-7B-Instruct-f16.gguf f16
.\build\bin\Release\llama-qwen2vl-cli.exe -m .\models\Qwen2-VL-7B-Instruct-IQ4_NL.gguf --mmproj .\models\mmproj-Qwen2-VL-7B-Instruct-f16.gguf -p 'What could be the context of this image.' --image '.\Pictures\Untitled.png' --seed 0 --temp 0 -ngl 99 |
It's not working :(
stable-diffusion.cpp's cli does allow me convert it to f16, but I think its strips off important metadata:
|
Ah, I think you have to use the surgery script: python ./examples/llava/qwen2_vl_surgery.py Qwen/Qwen2-VL-2B-Instruct --data_type fp16 |
It's the same mmproj for the 2b and the 7B model? |
It seems not |
CPU:
Vulkan (ngl 99):
Still not working |
Can you try enabling GGML_VULKAN_CHECK_RESULTS and see if it identifies the broken op? You might need to manually add the cpu backend source files to ggml-vulkan (I think this broke when the backends were refactored). |
|
To fix those linker issues you need to add the ggml-cpu sources to ggml-vulkan. |
Building with |
|
I can confirm this issue happens even with no layers offloaded. On CPU backend it works fine. Model is BF16, projector F16. Same assert as above. |
It’s a slightly different model, but it works well with MobileVLM, which uses CLIP. It doesn’t seem to be an issue with CLIP itself.
"The image features a black background with white text. The text reads "readable text", indicating a focus on the readability of the text. The text is written in an all-caps format, suggesting that it may be written in a non-traditional or serif font, which is sometimes seen in more modern digital writing. The text is centered, making it the main point of interest in the image. The image does not contain any other objects or elements, and the text is the only source of information. The overall impression is one of simplicity and focus on the text itself." |
Running clip on CPU solves this issue, the main model can still be kept on GPU. Possibly related to #10896 (that is a workaround, not a fix) |
FWIW this works for me at top of tree, on RTX 4070/Windows. |
Looks like it's working now. |
If you got the time, it might be interesting to bisect the repo to figure out which commit fixed it, cause I don't think it was intentional. |
How do I re-enable it? I'm on a fresh build (-DGGML_VULKAN=ON -DBUILD_SHARED_LIBS=OFF -DGGML_RPC=ON -DGGML_VULKAN_CHECK_RESULTS=OFF) at commit e6e7c75 Edit: nevermind I'm stupid |
Ok so it's indeed still broken with clip on Vulkan. Should I reopen? |
Yeah, keep it open while it's not fixed. |
I verified I can repro with that other PR reverted. Looks like the clip code always executes the graph on the Vulkan backend even for ops that aren't supported. I guess that's why the GPU backends were disabled? |
This was fixed by #11902. |
Name and Version
.\build\bin\Release\llama-cli.exe --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
version: 4329 (89d604f)
built with MSVC 19.41.34120.0 for x64
Operating systems
Windows
GGML backends
Vulkan
Hardware
Ryzen 5900X +RX 5700 XT
Models
Qwen2-VL-7B-Instruct-IQ4_NL + mmproj-Qwen2-VL-7B-Instruct-f32
Problem description & steps to reproduce
When I run it on Vulkan build, the description given by the model has nothing to do with the image given as argument (no matter the
-ngl
value, even-ngl 0
is broken). The exact same setup works perfectly fine on CPU backend.I know the Vulkan backend doesn't support Qwen2-VL yet, but according to #10361 (comment), this should only cause slowdowns, not invalid outputs.
Relevant log output
Image input:
-ngl 0
-ngl 99
CPU backend for comparison
The text was updated successfully, but these errors were encountered: