Support `llama.cpp` directly, bypassing `ollama` #233

savchenko · 2024-11-13T06:11:45Z

Given the close relationship between ollama and llama.cpp, would it be possible to support llama-server?

It exposes OpenAI-compatible HTTP endpoint on localhost.

fmaclen · 2024-11-13T13:22:10Z

We recently added support for OpenAI servers, you can find the configuration in the Settings view.

Can you configure it with your llama-server and let me know if it works?

savchenko · 2024-11-13T13:51:12Z

Tested with v0.20.1, connectivity reports as working:

However, models parsing fails and model can't be selected in the "Sessions" tab.

In llama.cpp console request is successful:

request: GET /v1/models 127.0.0.1 200

Manual curl returns:

{
  "object": "list",
  "data": [
    {
      "id": "/home/user/Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf",
      "object": "model",
      "created": 1731505790,
      "owned_by": "llamacpp",
      "meta": {
        "vocab_type": 2,
        "n_vocab": 152064,
        "n_ctx_train": 32768,
        "n_embd": 5120,
        "n_params": 32763876352,
        "size": 18778431488
      }
    }
  ]
}

EDIT

I have noticed that OpenAPI endpoint can't be saved without an API key, "refresh" button in UI is inactive unless the key field is non-empty.

Providing one does not make any difference though.

fmaclen · 2024-11-13T14:02:26Z

Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.

I have noticed that OpenAPI endpoint can't be saved without an API key

Yeah, since this feature was designed specifically for OpenAI it wouldn't work without an API key so that's why we made it "mandatory", but we should probably document this better.

When we connect to Ollama via the OpenAI-compatible API we just enter a random API key which gets ignored anyways.

savchenko · 2024-11-13T15:30:20Z

Thanks for the detailed report, I'll need to take a closer look to see where it might be going wrong.

Not a problem.

Also, I've checked the console, but there is no output at any level apart from the benign preload warnings.

"Network" tab shows 200s to the ../models/ with the same JSON payload as I have provided above.

fmaclen · 2024-11-14T00:32:57Z

Found the cause of the problem. Our current implementation filters out any models that don't include gpt in their name.
Therefore Qwen2.5-Coder-32B-Instruct-Q4_K_S.gguf gets filtered out.

Removing the filter makes it work:

This is because when we get the models from OpenAI it also sends back a list of non-LLM models that are incompatible with Hollama.

fmaclen · 2024-11-15T17:06:44Z

@savchenko here's a work-in-progress demo if you want to check it out:
https://llama-cpp-llama-server-opena.hollama.pages.dev/settings

You'll need to add a "OpenAI compatible" connection type to setup your llama.cpp server.

savchenko · 2024-11-15T17:08:38Z

@fmaclen , the interface is slightly broken. Latest Firefox ESR, v128.3.1

fmaclen · 2024-11-15T17:50:08Z

@savchenko slightly broken is quite the understatement 😅
Just pushed a fix, if you refresh the page it should look correct in Firefox.

savchenko · 2024-11-16T03:19:31Z

Fresh container build from 00f5862

Firefox

Clicking on the SL links yields no UI changes, while in the dev. console:

Uncaught (in promise) TypeError: e.servers is undefined
    Immutable 10
        r
        ce
        F
        _t
        at
        jt
        le
        rt
        rn
        ln
    <anonymous> http://localhost:4173/sessions:45
    promise callback* http://localhost:4173/sessions:44
3.BdijOe1Y.js:1:3551

Chromium

The interface works in Chromium, however attempting to query llama.cpp shows the following error:

I do not observe any new messages in the llama's stdout after clicking "Run" in Hollama.

fmaclen · 2024-11-16T13:44:37Z

Thanks for the update.

I was able to replicate the issue you are seeing with Firefox and I'm pretty sure it's caused by some hacky code I wrote just to quickly try things out.

That being said, it works fine for me in Chromium. If you were using the most recent release of Hollama in the same browser (with the same URL/hostname) it's possible it might have conflicting settings stored in localStorage. This is something I still need to test/QA before releasing this new version.

Couple of questions, if you don't mind:

What command are you running to build the container?
Does the UI load correctly in Firefox from the live demo? https://llama-cpp-llama-server-opena.hollama.pages.dev/settings
In Chromium, do you still get the same error Invalid strategy in an Incognito window?

savchenko · 2024-11-16T14:42:18Z

git pull && git checkout 00f5862
docker build -t maybellama .
docker run -p 4173:4173 maybellama

Yes
Yes

fmaclen · 2024-11-16T21:44:07Z

@savchenko thanks for the clarification.

Try to build fee51b7 which should have fixed the Invalid strategy error and the layout issues in Firefox.
There are still a handful of smaller bugs but you should be able to interact with llama-sever 🤞

savchenko · 2024-11-17T07:23:13Z

Success!

Shall this be closed?

fmaclen · 2024-11-17T13:42:16Z

Glad to hear it's working!

Shall this be closed?

No, the issue will be closed automatically once the feature is released.
There is still a fair amount of cleanup and testing I need to do before we can push this out.

fmaclen · 2024-11-25T15:07:25Z

🎉 This issue has been resolved in version 0.22.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

savchenko changed the title ~~Support llama.cpp~~ Support llama.cpp directly, bypassing ollama Nov 13, 2024

fmaclen added the triage Need to investigate further label Nov 13, 2024

fmaclen added bug Something isn't working and removed triage Need to investigate further labels Nov 14, 2024

fmaclen self-assigned this Nov 14, 2024

fmaclen added the priority label Nov 14, 2024

fmaclen mentioned this issue Nov 14, 2024

feat: can connect multiple servers #234

Merged

7 tasks

fmaclen mentioned this issue Nov 18, 2024

Knowledge is reset after the message is sent #237

Closed

fmaclen closed this as completed in #234 Nov 25, 2024

fmaclen closed this as completed in b8d258f Nov 25, 2024

fmaclen added the released label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support `llama.cpp` directly, bypassing `ollama` #233

Support `llama.cpp` directly, bypassing `ollama` #233

savchenko commented Nov 13, 2024 •

edited

Loading

fmaclen commented Nov 13, 2024

savchenko commented Nov 13, 2024 •

edited

Loading

fmaclen commented Nov 13, 2024 •

edited

Loading

savchenko commented Nov 13, 2024

fmaclen commented Nov 14, 2024

fmaclen commented Nov 15, 2024

savchenko commented Nov 15, 2024

fmaclen commented Nov 15, 2024

savchenko commented Nov 16, 2024

fmaclen commented Nov 16, 2024

savchenko commented Nov 16, 2024

fmaclen commented Nov 16, 2024

savchenko commented Nov 17, 2024

fmaclen commented Nov 17, 2024

fmaclen commented Nov 25, 2024

Support llama.cpp directly, bypassing ollama #233

Support llama.cpp directly, bypassing ollama #233

Comments

savchenko commented Nov 13, 2024 • edited Loading

fmaclen commented Nov 13, 2024

savchenko commented Nov 13, 2024 • edited Loading

fmaclen commented Nov 13, 2024 • edited Loading

savchenko commented Nov 13, 2024

fmaclen commented Nov 14, 2024

fmaclen commented Nov 15, 2024

savchenko commented Nov 15, 2024

fmaclen commented Nov 15, 2024

savchenko commented Nov 16, 2024

Firefox

Chromium

fmaclen commented Nov 16, 2024

savchenko commented Nov 16, 2024

fmaclen commented Nov 16, 2024

savchenko commented Nov 17, 2024

fmaclen commented Nov 17, 2024

fmaclen commented Nov 25, 2024

Support `llama.cpp` directly, bypassing `ollama` #233

Support `llama.cpp` directly, bypassing `ollama` #233

savchenko commented Nov 13, 2024 •

edited

Loading

savchenko commented Nov 13, 2024 •

edited

Loading

fmaclen commented Nov 13, 2024 •

edited

Loading