Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

tarruda · 2023-03-18T15:34:35Z

The goal of this refactor is allow reusing the model execution while using streams other than stdin/stdout for interaction.

In my case, I'd like to implement a simple TCP server (which is enabled as a command-line option) that will run llama_main for each new connection, which will be handled in a child process via fork(). This would bring a few benefits:

Loading model weights can be very slow, so in TCP mode we can load it before listening. Each new connection is handled in a forked process, which inherits the parent's memory (so doesn't have to reload the model)
We can quickly start a new context by opening a new TCP socket. New connections will also be able to specify some new parameters such as seed and prompt.
It becomes easier to wrap this into a REST/HTTP server
Can be more convenient in a LAN where you have a powerful computer as the model server.

If this PR is accepted, I will follow up with a PR that implements the TCP server command line option

This PR is simpler to review than it appears. Just look at the commits individually (most of the additions/deletions happen in the first commit, where main.cpp is simply renamed as llama.cpp).

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

The goal is to allow running llama_main while connected to other streams, such as TCP sockets. Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Green-Sky · 2023-03-18T15:48:40Z

How does this PR tie into the current active refactor here #77 ?

tarruda · 2023-03-18T15:54:01Z

How does this PR tie into the current active refactor here #77 ?

I was not aware of that PR, I should have searched it first. The only reason I created this PR is because I had a clear vision of how to implement a TCP server mode into llama.cpp. Honestly not sure what to do, should I close this PR?

Green-Sky · 2023-03-18T16:14:06Z

Honestly not sure what to do, should I close this PR?

Not my call, but you could review the other PR with your insight :)

tarruda · 2023-03-18T16:25:51Z

Not my call, but you could review the other PR with your insight :)

I had a quick look and it seems that the goal in #77 is to make llama.cpp embeddable as a library, which requires modifying/refactoring more than what I do here.

This PR has no such goals and makes almost no changes to existing code. It can be summarized as:

Most code in main.cpp moved to llama.cpp. Didn't split existing functions, only created llama_main which has most code of the old main function.
llama_main now accepts the following as arguments:
- parsed parameters
- preloaded model
- input/output/error streams which are now used instead of hardcoded stdin/stdout/stderr

ggerganov · 2023-03-18T17:10:35Z

@tarruda
Adding a TCP server would be awesome!
Please keep doing this - for now, do it on a branch in this repo as you find best. Just invited you as a collaborator.
I will review #77 very soon and merge it first. After that, we will update your changes to fit the C-style API

tarruda · 2023-03-19T18:44:50Z

Closing in favor of #278

Move main.cpp to llama.cpp

51d003e

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

tarruda force-pushed the refactor-llama-main branch from 9132124 to 934840d Compare March 18, 2023 15:40

tarruda added 4 commits March 18, 2023 12:40

Move struct definitions in llama.cpp to llama.h

82e70db

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Add main.cpp back, and invoke llama_main from it

e364847

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Move model loading back to main.cpp

1088d2d

Signed-off-by: Thiago Padilha <thiago@padilha.cc>

Remove direct access to std streams from llama_main

edc17cf

The goal is to allow running llama_main while connected to other streams, such as TCP sockets. Signed-off-by: Thiago Padilha <thiago@padilha.cc>

tarruda force-pushed the refactor-llama-main branch from 934840d to edc17cf Compare March 18, 2023 15:41

tarruda mentioned this pull request Mar 19, 2023

Proof of concept TCP server mode #278

Closed

tarruda closed this Mar 19, 2023

tarruda deleted the refactor-llama-main branch March 20, 2023 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

tarruda commented Mar 18, 2023

Green-Sky commented Mar 18, 2023

tarruda commented Mar 18, 2023

Green-Sky commented Mar 18, 2023

tarruda commented Mar 18, 2023

ggerganov commented Mar 18, 2023 •

edited

Loading

tarruda commented Mar 19, 2023

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

Refactor most code in main.cpp into a separate module (preparing to implement TCP mode) #267

Conversation

tarruda commented Mar 18, 2023

Green-Sky commented Mar 18, 2023

tarruda commented Mar 18, 2023

Green-Sky commented Mar 18, 2023

tarruda commented Mar 18, 2023

ggerganov commented Mar 18, 2023 • edited Loading

tarruda commented Mar 19, 2023

ggerganov commented Mar 18, 2023 •

edited

Loading