Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create json api service #88

Closed
wizd opened this issue Mar 13, 2023 · 8 comments
Closed

Create json api service #88

wizd opened this issue Mar 13, 2023 · 8 comments
Labels
need more info The OP should provide more details about the issue

Comments

@wizd
Copy link

wizd commented Mar 13, 2023

so we can intergrate app/UI.

@ggerganov ggerganov added the need more info The OP should provide more details about the issue label Mar 13, 2023
@wizd
Copy link
Author

wizd commented Mar 13, 2023

emulate openai text api, so tons of apps could support llama without change.

@henk717
Copy link

henk717 commented Mar 14, 2023

+1 on this, people would love to have this in KoboldAI but we have no good way of implementing it at the moment.
We already have OpenAI support so that would work, we also have a different basic json API that just sends the desired values over json and handles the output string.

Whatever way works, but doing json over http is going to be ideal for cross language implementations such as python or (in browser) javascript.

@MLTQ
Copy link

MLTQ commented Mar 14, 2023

Sounds like the ideal structure of this would be to load the model into memory in interactive mode, listen for input on some port, then wait for initial prompt & reverse prompt, then post the json response to that same port.
Because it outputs word by word, maybe a websocket implementation?

This seems like a viable option too: #23 (comment)

@i-am-neo
Copy link

i-am-neo commented Mar 17, 2023

Websocket is an option, but would you be willing to pay whomever will host the backend?

@LostRuins
Copy link
Collaborator

LostRuins commented Mar 18, 2023

Hi @henk717 I've gone ahead and created https://github.com/LostRuins/llamacpp-for-kobold which emulates a KoboldAI HTTP server, allowing it to be used as a custom API endpoint from within Kobold.

I wrote my own python ctypes bindings, and it requires zero other dependencies (no Flask, no Pybind11) except for llamalib.dll and Python itself. Windows binaries are included, but you can also rebuild the library from the makefile.

I also went ahead and added left square brackets to the banned tokens.

Unfortunately, it's not very ideal due to a fundamental flaw in llama.cpp where generation delay scales linearly with prompt length unlike on Huggingface Transformers. See this discussion for details.

@avilum
Copy link

avilum commented Mar 19, 2023

Hey guys, if anyone is seeking for working client/server implementation;
I wrote a minimal realtime Go server and Python client with live inference streaming, that is based on this awesome repo.
See https://github.com/avilum/llama-saas

@thomasantony
Copy link
Collaborator

I have a proof of concept working with an existing web UI here:

oobabooga/text-generation-webui#447

It is very unpolished, but getting somewhere.

@dranger003
Copy link
Contributor

Hi there, I recently worked on C# bindings and a basic .NET core project. There are two sample projects included (CLI/Web + API). It could be easily be expanded with a more extensive JSON interface. Hope this is helpful.

https://github.com/dranger003/llama.cpp-dotnet

dmahurin pushed a commit to dmahurin/llama.cpp that referenced this issue May 31, 2023
dmahurin pushed a commit to dmahurin/llama.cpp that referenced this issue Jun 1, 2023
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info The OP should provide more details about the issue
Projects
None yet
Development

No branches or pull requests

9 participants