-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support llama.cpp
directly, bypassing ollama
#233
Comments
llama.cpp
llama.cpp
directly, bypassing ollama
We recently added support for OpenAI servers, you can find the configuration in the Settings view. Can you configure it with your llama-server and let me know if it works? |
Tested with v0.20.1, connectivity reports as working: However, models parsing fails and model can't be selected in the "Sessions" tab. In
Manual {
"object": "list",
"data": [
{
"id": "/home/user/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf",
"object": "model",
"created": 1731505790,
"owned_by": "llamacpp",
"meta": {
"vocab_type": 2,
"n_vocab": 152064,
"n_ctx_train": 32768,
"n_embd": 5120,
"n_params": 32763876352,
"size": 18778431488
}
}
]
} EDIT I have noticed that OpenAPI endpoint can't be saved without an API key, "refresh" button in UI is inactive unless the key field is non-empty. Providing one does not make any difference though. |
Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.
Yeah, since this feature was designed specifically for OpenAI it wouldn't work without an API key so that's why we made it "mandatory", but we should probably document this better. When we connect to Ollama via the OpenAI-compatible API we just enter a random API key which gets ignored anyways. |
Not a problem. Also, I've checked the console, but there is no output at any level apart from the benign preload warnings. "Network" tab shows 200s to the |
Found the cause of the problem. Our current implementation filters out any models that don't include Removing the filter makes it work: This is because when we get the models from OpenAI it also sends back a list of non-LLM models that are incompatible with Hollama. |
@savchenko here's a work-in-progress demo if you want to check it out: You'll need to add a "OpenAI compatible" connection type to setup your llama.cpp server. |
@fmaclen , the interface is slightly broken. Latest Firefox ESR, v128.3.1 |
@savchenko slightly broken is quite the understatement 😅 |
Fresh container build from 00f5862 FirefoxClicking on the SL links yields no UI changes, while in the dev. console:
ChromiumThe interface works in Chromium, however attempting to query I do not observe any new messages in the llama's stdout after clicking "Run" in Hollama. |
Thanks for the update. I was able to replicate the issue you are seeing with Firefox and I'm pretty sure it's caused by some hacky code I wrote just to quickly try things out. That being said, it works fine for me in Chromium. If you were using the most recent release of Hollama in the same browser (with the same URL/hostname) it's possible it might have conflicting settings stored in Couple of questions, if you don't mind:
|
@savchenko thanks for the clarification. Try to build |
Glad to hear it's working!
No, the issue will be closed automatically once the feature is released. |
🎉 This issue has been resolved in version 0.22.0 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
Given the close relationship between ollama and llama.cpp, would it be possible to support llama-server?
It exposes OpenAI-compatible HTTP endpoint on localhost.
The text was updated successfully, but these errors were encountered: