-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322
Comments
I'm working on vision APi refactoring, and I can confirm that enabling GPU (I'm using Metal backend) make it crashes |
great to hear you are on it :) |
OK thanks for the info. I don't know who disabled it in the first place, but will see after I finish my initial refactoring |
I can confirm MiniCPM-o 2.6 with their fork of llama.cpp works with CUDA support in CLIP and modify examples/llava/clip.cpp, uncomment CUDA related comments, rebuild with cmake note: |
ggerganov removed it here |
Yea, it appears there were lots of issue reports and gg got annoyed by the state of the llava-like examples. I think the problem was the refactor of the gguf backend system, during that all backends were added into clip.cpp and before that clip.cpp ONLY had CUDA enabled. So my recommendation is to enable CUDA again, other acceleration backends only if someone has tested them well. |
Then open a PR and discuss it with gg. I think it should also still be working on Vulkan, but I'd have to test it. |
It is still working on Vulkan and Metal for most architectures except Qwen2VL. On CUDA its fine on all architectures. |
Vulkan should work on Qwen2VL now after #11902. The issue was some missing rope variants. If somebody makes a PR to reenable for CUDA, please also reenable for Vulkan. |
Could someone please test this PR on your backend: #12322 I've tested on Metal and it works fine despite op Conv2d not supported by Metal (the ggml sched automatically use CPU for non-supported ops) Would be nice if someone can test on CUDA, SYCL and Vulkan |
I've tested with Qwen2 and a few other models using Vulkan. Everything seems to be working fine. I am trying to gather all of the data in a document. |
Name and Version
all versions 2025
Operating systems
Linux
GGML backends
CUDA, Metal
Hardware
any
Models
No response
Problem description & steps to reproduce
In PR #10896 the GPU support of clip.cpp has been removed, it's basically just a few comments around fullly functional code.
Hundreds of hours went into CLIP and having it on GPU support was a major feat for the vision capabilities of llama.cpp, this also caused large SOTA models to be implemented in llama.cpp with people working dedicated on those patches.
I agree that the vision implementation was not great but we've had SOTA support by models like minicpm-v-2.6 and even the new minicpmv-o-2.6 has been implemented in llama.cpp at launch with dedicated people working on it with a current PR waiting for merge.
This change renders those models useless for anyone who is not aware on how to hack the llama.cpp code. It takes minutes instead of milliseconds now.
I strongly recommend to add the GPU support back into clip.cpp while it is still compatible (currently it is) with the core so people can use llama.cpp with vision capabilities again.
To prevent people posting unwelcome issues, add a warning message instead.
Right now we see issues being created that vision support is not working anymore, that GPU support is failing etc. Longterm it will cause developers to stop support llama.cpp as engine for vision.
First Bad Commit
#10896
Relevant log output
The text was updated successfully, but these errors were encountered: