Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322

cmp-nct · 2025-01-21T03:24:05Z

Name and Version

all versions 2025

Operating systems

Linux

GGML backends

CUDA, Metal

Hardware

any

Models

No response

Problem description & steps to reproduce

In PR #10896 the GPU support of clip.cpp has been removed, it's basically just a few comments around fullly functional code.

Hundreds of hours went into CLIP and having it on GPU support was a major feat for the vision capabilities of llama.cpp, this also caused large SOTA models to be implemented in llama.cpp with people working dedicated on those patches.

I agree that the vision implementation was not great but we've had SOTA support by models like minicpm-v-2.6 and even the new minicpmv-o-2.6 has been implemented in llama.cpp at launch with dedicated people working on it with a current PR waiting for merge.

This change renders those models useless for anyone who is not aware on how to hack the llama.cpp code. It takes minutes instead of milliseconds now.

I strongly recommend to add the GPU support back into clip.cpp while it is still compatible (currently it is) with the core so people can use llama.cpp with vision capabilities again.
To prevent people posting unwelcome issues, add a warning message instead.
Right now we see issues being created that vision support is not working anymore, that GPU support is failing etc. Longterm it will cause developers to stop support llama.cpp as engine for vision.

First Bad Commit

#10896

Relevant log output

ngxson · 2025-01-21T08:36:47Z

I'm working on vision APi refactoring, and I can confirm that enabling GPU (I'm using Metal backend) make it crashes

cmp-nct · 2025-01-21T18:25:17Z

I'm working on vision APi refactoring, and I can confirm that enabling GPU (I'm using Metal backend) make it crashes

great to hear you are on it :)
On Cuda the backend works, minicpm takes 500ms on GPU when it takes 1-2 minutes on CPU.
Maybe we can just enable the backends that do work, CUDA is a huge step given the popularity of nvidia cards.

ngxson · 2025-01-21T21:36:10Z

OK thanks for the info. I don't know who disabled it in the first place, but will see after I finish my initial refactoring

ekk1 · 2025-01-29T06:39:28Z

I can confirm MiniCPM-o 2.6 with their fork of llama.cpp works with CUDA support in CLIP

following https://github.com/OpenBMB/llama.cpp/blob/minicpmv-main/examples/llava/README-minicpmv2.6.md

and modify examples/llava/clip.cpp, uncomment CUDA related comments, rebuild with cmake
the speedup is significant, from 16000+ ms -> 40+ ms (16t Intel 8352V -> 4090 )

note:
examples/llava/minicpmv-cli.cpp may also needs to be modified, change the value 256 to whatever you need, otherwise the inference will be prematurely interrupted.

nick-pape · 2025-02-12T22:42:20Z

OK thanks for the info. I don't know who disabled it in the first place, but will see after I finish my initial refactoring

ggerganov removed it here

cmp-nct · 2025-02-14T04:20:01Z

Yea, it appears there were lots of issue reports and gg got annoyed by the state of the llava-like examples.
I've no idea what those issues were about but I've used clip on CUDA and many visual models in extensive tests, very successfully.

I think the problem was the refactor of the gguf backend system, during that all backends were added into clip.cpp and before that clip.cpp ONLY had CUDA enabled.

So my recommendation is to enable CUDA again, other acceleration backends only if someone has tested them well.
It's just 3 lines of code to make clip work with cuda.

0cc4m · 2025-02-14T07:41:34Z

Then open a PR and discuss it with gg. I think it should also still be working on Vulkan, but I'd have to test it.

LostRuins · 2025-02-14T07:46:05Z

It is still working on Vulkan and Metal for most architectures except Qwen2VL. On CUDA its fine on all architectures.

jeffbolznv · 2025-02-17T00:27:16Z

Vulkan should work on Qwen2VL now after #11902. The issue was some missing rope variants. If somebody makes a PR to reenable for CUDA, please also reenable for Vulkan.

ngxson · 2025-03-10T22:19:18Z

Could someone please test this PR on your backend: #12322

I've tested on Metal and it works fine despite op Conv2d not supported by Metal (the ggml sched automatically use CPU for non-supported ops)

Would be nice if someone can test on CUDA, SYCL and Vulkan

vladislavdonchev · 2025-03-11T07:25:49Z

I've tested with Qwen2 and a few other models using Vulkan. Everything seems to be working fine.

I am trying to gather all of the data in a document.

cmp-nct added the bug-unconfirmed label Jan 21, 2025

nick-pape mentioned this issue Feb 12, 2025

GPT-4o/Vision models cannot use GPU due to CLIP changes mudler/LocalAI#4815

Open

jeffbolznv mentioned this issue Feb 16, 2025

vulkan: support multi/vision rope, and noncontiguous rope #11902

Merged

vladislavdonchev mentioned this issue Feb 22, 2025

Feature Request: Qwen 2.5 VL #11483

Open

4 tasks

ngxson mentioned this issue Mar 10, 2025

clip : bring back GPU support #12322

Merged

ngxson closed this as completed in #12322 Mar 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322

Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322

cmp-nct commented Jan 21, 2025

ngxson commented Jan 21, 2025

cmp-nct commented Jan 21, 2025

ngxson commented Jan 21, 2025

ekk1 commented Jan 29, 2025

nick-pape commented Feb 12, 2025

cmp-nct commented Feb 14, 2025

0cc4m commented Feb 14, 2025

LostRuins commented Feb 14, 2025 •

edited

Loading

jeffbolznv commented Feb 17, 2025

ngxson commented Mar 10, 2025

vladislavdonchev commented Mar 11, 2025

Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322

Eval bug: clip.cpp has no GPU support - a lot of work is at risk #11322

Comments

cmp-nct commented Jan 21, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ngxson commented Jan 21, 2025

cmp-nct commented Jan 21, 2025

ngxson commented Jan 21, 2025

ekk1 commented Jan 29, 2025

nick-pape commented Feb 12, 2025

cmp-nct commented Feb 14, 2025

0cc4m commented Feb 14, 2025

LostRuins commented Feb 14, 2025 • edited Loading

jeffbolznv commented Feb 17, 2025

ngxson commented Mar 10, 2025

vladislavdonchev commented Mar 11, 2025

LostRuins commented Feb 14, 2025 •

edited

Loading