-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use with llama.cpp #8
Comments
You can use it with local LLMs by just setting the base_url. E.g. I use it with ollama by using
or with llama_cpp.server (which starts the OpenAI API chat compatible server at port 8080)
To interact with the resulting proxy you need to use something that allows you to chat with an OpenAI API compatible endpoint. E.g. it would work with https://github.com/oobabooga/text-generation-webui , someone at reddit was able to set it up with oobabooga easily (see here ). I checked llama.cpp's It should not be hard to actually create a GUI to enable comparing different approaches. I have added it as an item #9 . |
What if I run llama-server instead, at port 8080? When I start that and then run optillm, I get an API_KEY error: openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable It's probably obvious, but what do I do to fix that? thanks. |
When using a local model just put any value as the key. So something like |
Thanks. It runs now but doesn't do anything as you mentioned above since llama-server ignores it. Looking forward to further development. |
Thanks for trying it out. Once we set up the llama.cpp server we still need to pass the We can use the
Above, we are creating an OpenAI compatible chat completions endpoint for using the model Now, to uset optillm, you need to call it from your code as shown in readme. Use the name
This request will first hit the optillm proxy at http://localhost:8000 where we will parse the Then we will apply the Does that help? If you are still unable to get it working, I can do a detailed guide with screenshots or a video. Let me know. |
I was able to get optillm working with llama.cpp server and sillytavern on my m1 macbook pro. Not sure if I did it correctly though as the results were... meh. But here's what I did:
By that point it all works, but again for my tests the results were worse than just using the model directly. |
@0xcoolio Glad that you were able to get it running. Regarding the error The You can try some of the other approaches and they should work e.g. Or, you could try using ollama to run the model locally, their OpenAI API compatible local endpoint does allow sampling multiple responses using the |
I've summarized this discussion into a small section in the README (#27) so that other people can try out this project easily. I hope it's all correct. |
I'm trying to understand if this could be used with a local llm via llama.cpp in interactive mode. Is this possible? Would very much like to try this out.
The text was updated successfully, but these errors were encountered: