-
Notifications
You must be signed in to change notification settings - Fork 12.2k
docs : add Moondream2 pre-quantized link #13745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ngxson for visibility. It might be good to move the model ggufs from a private repo to the official ggml-org repo. |
Can you also share the steps and commands you used to generate the mmproj GGUF? It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution |
Hello @ngxson , I didn't create the mmproj. The author updated them in Huggingface a few days ago. However, that text model didn't have a chat template in it, so I just edited the gguf to add that field. There is a create_gguf.py script in one of the branches of the moondream repo, I expect it came from there: https://github.com/vikhyat/moondream/blob/moondream-ggml/create_gguf.py |
I saw this model on /r/locallama the other day and benchmarks looked impressive so I ran this through a few tests with Gemini 2.5 as judge https://gist.github.com/kth8/195bfe61e8c3b2ef8cce4bf263808e2d |
Hello, is it possible to use it with detect or point methods in llama.cpp? |
This is cool, mind sharing the prompt you used for this? Also just to clarify, the .gguf files in the HF repository are a year old. We've made a bunch of architectural changes since then so it's no longer possible to run the latest versions (that include the detect, point etc. capabilities) using llama.cpp. |
For the test model, just a simple and the prompt for Gemini prompt = "You are a sophisticated, advanced multimodal language model. Your primary function in this task is to act as an expert evaluator. You will be provided with an image and corresponding description of that image generated by a smaller, potentially less capable, vision-language model {model}. Your task is to first generate an expert analysis of provided image then conduct a thorough and critical analysis of the smaller {model} VLM's generated description of the same image for inaccuracies and hallucinations. Tally up all the inaccuracies at the end and provide an overall conclusion.".format(model=test_model) |
Moondream2 model GGUF has been updated in https://huggingface.co/vikhyatk/moondream2 to the latest version, and it works with llama.cpp. However, the model vikhyatk published does not have a default chat template. The version at https://huggingface.co/Hahasb/moondream2-20250414-GGUF has been updated with tokenizer.chat_template=vicuna, which seems to work ok, but not sure if this is the optimal setup.
Fixes #13332
Fixes vikhyat/moondream#96
Moondream2 is an crazy good model compared to its tiny size. After this is merged, I'll start experimenting with quantizations, but even the fp16 version is tiny (less than 3GB for text, less than 1GB for the mmproj).