-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : (proposal) implement cpp-first library llama-cpp.h
#10583
base: master
Are you sure you want to change the base?
Conversation
I like the proposal overall, but I am concerned that if we do this, first, we would need to maintain two different APIs and commit to avoid breaking changes unless absolutely necessary. This would increase the maintenance cost. Second, there is a significant risk that the C API would become a second class citizen, which is something that we must absolutely avoid because it is the only way to have a stable ABI that can be used from other languages. |
What I'm thinking is that when looking at the big picture, it's not adding too much maintenance cost, because:
In the long term, I think it will be cleaner to separate the library into 2 layers:
This will effectively make c library to be a second-class citizen. But I think this is somewhat acceptable. My point is that:
Let me take an example, we have std::vector<llama_token> tokenize(
const llama_cpp::model & model,
const std::string & raw_text,
bool add_special,
bool parse_special = false); Which will then be wrapped into c: int32_t llama_tokenize(
const struct llama_model * model,
const char * text,
int32_t text_len,
llama_token * tokens,
int32_t n_tokens_max,
bool add_special,
bool parse_special); But when exposed to a high-level language like python, end-user expect the function to take this shape: tokenize(
model: LlamaModel,
raw_text: str,
add_special: bool,
parse_special: bool) -> List[LlamaToken] So having cpp function as a reference point should give a more "universal" experience in this case. |
The C++ API is only usable by C++ and nothing else. The C-style API is universal and can be used by any language, so it has to be the first-class citizen. The examples must use the C-style interface in order to exercise it, even if it were a very thin layer on top of a C++ interface.
We cannot draw conclusions based just on this discussion, so I'm not convinced that this is the main challenge atm. We can keep extending There are various inefficiencies in the existing C-style API, like the tokenization that you pointed out, but we can improve those. Also, another advantage of not having the internal For |
Hmm ok, that make sense, so I think for now we can keep c-style public API as first-class, and keep cpp function as internal-only. The same can be done with llamax. Although, I'm a bit concern that because llamax is built on top of llama.h, it should ideally use directly the cpp funtions provided by llama.c, instead of doing cpp (llama.cpp) --> c (llama.h) --> cpp (llamax.cpp) --> c (llamax.h).
I believe that extending In either way, I think having cpp and c-style wrapper separated in |
Motivation
I was re-thinking about the idea of llamax (#5215) today. While we're still far from having llamax, to get there, we need to have a step-by-step plan.
Currently, one of the main challenge for developers who want to adopt llama.cpp is that we don't have cpp functions exposed in
llama.h
(ref: this discussion)I think it would be nice to take this as the first step toward llamax. We should firstly have a library that uses cpp types like
std::string
andstd::vector
, instead of passing c-pointer.Implementation
The
llama-cpp.h
seems to be a good starting point. It's already use by one single example (llama-run
), so could be used as a WIP for now.My implementation in this PR is very draft and mostly for demo purpose.
The final goal is to have all functions being "cpp-first", then having c-only
llama.h
as their wrapper (NOTE: this is reverse to what we do withcommon
. Currently, the cppcommon
is wrapper for c functions inllama.h
)I'm opening this proposal for discussion, kindly invite @slaren and @ggerganov to join in.
Thank you.