Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

Closed
wants to merge 5 commits into from

Conversation

tarruda
Copy link

@tarruda tarruda commented Mar 18, 2023

The goal of this refactor is allow reusing the model execution while using streams other than stdin/stdout for interaction.

In my case, I'd like to implement a simple TCP server (which is enabled as a command-line option) that will run llama_main for each new connection, which will be handled in a child process via fork(). This would bring a few benefits:

  • Loading model weights can be very slow, so in TCP mode we can load it before listening. Each new connection is handled in a forked process, which inherits the parent's memory (so doesn't have to reload the model)
  • We can quickly start a new context by opening a new TCP socket. New connections will also be able to specify some new parameters such as seed and prompt.
  • It becomes easier to wrap this into a REST/HTTP server
  • Can be more convenient in a LAN where you have a powerful computer as the model server.

If this PR is accepted, I will follow up with a PR that implements the TCP server command line option

This PR is simpler to review than it appears. Just look at the commits individually (most of the additions/deletions happen in the first commit, where main.cpp is simply renamed as llama.cpp).

Signed-off-by: Thiago Padilha <thiago@padilha.cc>
@tarruda tarruda force-pushed the refactor-llama-main branch from 9132124 to 934840d Compare March 18, 2023 15:40
tarruda added 4 commits March 18, 2023 12:40
Signed-off-by: Thiago Padilha <thiago@padilha.cc>
Signed-off-by: Thiago Padilha <thiago@padilha.cc>
Signed-off-by: Thiago Padilha <thiago@padilha.cc>
The goal is to allow running llama_main while connected to other
streams, such as TCP sockets.

Signed-off-by: Thiago Padilha <thiago@padilha.cc>
@tarruda tarruda force-pushed the refactor-llama-main branch from 934840d to edc17cf Compare March 18, 2023 15:41
@Green-Sky
Copy link
Collaborator

How does this PR tie into the current active refactor here #77 ?

@tarruda
Copy link
Author

tarruda commented Mar 18, 2023

How does this PR tie into the current active refactor here #77 ?

I was not aware of that PR, I should have searched it first. The only reason I created this PR is because I had a clear vision of how to implement a TCP server mode into llama.cpp. Honestly not sure what to do, should I close this PR?

@Green-Sky
Copy link
Collaborator

Honestly not sure what to do, should I close this PR?

Not my call, but you could review the other PR with your insight :)

@tarruda
Copy link
Author

tarruda commented Mar 18, 2023

Not my call, but you could review the other PR with your insight :)

I had a quick look and it seems that the goal in #77 is to make llama.cpp embeddable as a library, which requires modifying/refactoring more than what I do here.

This PR has no such goals and makes almost no changes to existing code. It can be summarized as:

  • Most code in main.cpp moved to llama.cpp. Didn't split existing functions, only created llama_main which has most code of the old main function.
  • llama_main now accepts the following as arguments:
    • parsed parameters
    • preloaded model
    • input/output/error streams which are now used instead of hardcoded stdin/stdout/stderr

@ggerganov
Copy link
Owner

ggerganov commented Mar 18, 2023

@tarruda
Adding a TCP server would be awesome!
Please keep doing this - for now, do it on a branch in this repo as you find best. Just invited you as a collaborator.
I will review #77 very soon and merge it first. After that, we will update your changes to fit the C-style API

@tarruda
Copy link
Author

tarruda commented Mar 19, 2023

Closing in favor of #278

@tarruda tarruda closed this Mar 19, 2023
@tarruda tarruda deleted the refactor-llama-main branch March 20, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants