Skip to content

docs : add Moondream2 pre-quantized link #13745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 25, 2025
Merged

Conversation

ddpasa
Copy link
Contributor

@ddpasa ddpasa commented May 24, 2025

Moondream2 model GGUF has been updated in https://huggingface.co/vikhyatk/moondream2 to the latest version, and it works with llama.cpp. However, the model vikhyatk published does not have a default chat template. The version at https://huggingface.co/Hahasb/moondream2-20250414-GGUF has been updated with tokenizer.chat_template=vicuna, which seems to work ok, but not sure if this is the optimal setup.

Fixes #13332
Fixes vikhyat/moondream#96

Moondream2 is an crazy good model compared to its tiny size. After this is merged, I'll start experimenting with quantizations, but even the fp16 version is tiny (less than 3GB for text, less than 1GB for the mmproj).

@ddpasa
Copy link
Contributor Author

ddpasa commented May 24, 2025

@ngxson for visibility. It might be good to move the model ggufs from a private repo to the official ggml-org repo.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 24, 2025
@ngxson
Copy link
Collaborator

ngxson commented May 24, 2025

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

@ddpasa
Copy link
Contributor Author

ddpasa commented May 24, 2025

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

Hello @ngxson , I didn't create the mmproj. The author updated them in Huggingface a few days ago. However, that text model didn't have a chat template in it, so I just edited the gguf to add that field.

There is a create_gguf.py script in one of the branches of the moondream repo, I expect it came from there: https://github.com/vikhyat/moondream/blob/moondream-ggml/create_gguf.py

@ngxson ngxson changed the title Multimodal: Added Moondream2 model and fixed ggml.org link docs : add Moondream2 pre-quantized link May 25, 2025
@ngxson ngxson merged commit a08c1d2 into ggml-org:master May 25, 2025
2 checks passed
@kth8
Copy link

kth8 commented May 25, 2025

I saw this model on /r/locallama the other day and benchmarks looked impressive so I ran this through a few tests with Gemini 2.5 as judge https://gist.github.com/kth8/195bfe61e8c3b2ef8cce4bf263808e2d

@lus105
Copy link

lus105 commented May 28, 2025

Hello, is it possible to use it with detect or point methods in llama.cpp?

@vikhyat
Copy link
Contributor

vikhyat commented May 30, 2025

I saw this model on /r/locallama the other day and benchmarks looked impressive so I ran this through a few tests with Gemini 2.5 as judge https://gist.github.com/kth8/195bfe61e8c3b2ef8cce4bf263808e2d

This is cool, mind sharing the prompt you used for this?

Also just to clarify, the .gguf files in the HF repository are a year old. We've made a bunch of architectural changes since then so it's no longer possible to run the latest versions (that include the detect, point etc. capabilities) using llama.cpp.

@kth8
Copy link

kth8 commented May 30, 2025

For the test model, just a simple Provide a very detailed description of this image.

and the prompt for Gemini

prompt = "You are a sophisticated, advanced multimodal language model. Your primary function in this task is to act as an expert evaluator. You will be provided with an image and corresponding description of that image generated by a smaller, potentially less capable, vision-language model {model}. Your task is to first generate an expert analysis of provided image then conduct a thorough and critical analysis of the smaller {model} VLM's generated description of the same image for inaccuracies and hallucinations. Tally up all the inaccuracies at the end and provide an overall conclusion.".format(model=test_model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to run on llama.cpp Feature Request: moondream2 vlm support in mtmd
5 participants