-
Notifications
You must be signed in to change notification settings - Fork 11.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clip : offload to GPU #4061
Comments
It seems minor but I believe supporting CLIP is a major step ahead, it's such a fundamental model |
Ideally, CLIP should be supported as a separate model arch in We should do it at some point in the future. |
Maybe we can start with porting full text and vision encoder parts from my clip.cpp to |
I'd love to see full clip support in llama.cpp soon. |
@ggerganov I have implemented broadcast for the |
Great! Would be great to PR them in |
See #4205, I think that, for now, we shouldn't merge that pull request until applying the changes I made to ggml in the main project. This way, we'll also have a more comprehensive implementation, eliminating the repeats and all that. |
Do we have any updates on this feature? I am eager to use it! |
@ggerganov @FSSRepo |
@cmp-nct It seems that the architecture of vision model CLIP and Llama differs from the implementation here. The truth is that there will be a lot of work to do if we want to have it here. |
You are certainly right on the work required, it's likely about as much as the entire clip.cpp has been. At this point it's the best thing we have in open source for vision, it's right at eye level to GPT4-Vision. For "simple vision" llava-1.5 (ShareGPT4V atm) is working great with clip.cpp The only high level alternative is QwenVL which is significantly worse than CogVLM and about the same work to integrate here. |
With the recent support for running convolutions on the GPU (#4060) we should be able to offload CLIP to run fully on the GPU.
https://github.com/ggerganov/llama.cpp/blob/3d68f364f15778dc326f5024f2e5af1ad6dfddef/examples/llava/clip.cpp#L231-L236
ggml_acc
CUDA / Metal kernelsggml_repeat
where possible using broadcastggml-backend
API (see https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/main-backend.cpp)The text was updated successfully, but these errors were encountered: