llama : (proposal) implement cpp-first library `llama-cpp.h` #10583

ngxson · 2024-11-29T15:58:09Z

Motivation

I was re-thinking about the idea of llamax (#5215) today. While we're still far from having llamax, to get there, we need to have a step-by-step plan.

Currently, one of the main challenge for developers who want to adopt llama.cpp is that we don't have cpp functions exposed in llama.h (ref: this discussion)

I think it would be nice to take this as the first step toward llamax. We should firstly have a library that uses cpp types like std::string and std::vector, instead of passing c-pointer.

Implementation

The llama-cpp.h seems to be a good starting point. It's already use by one single example (llama-run), so could be used as a WIP for now.

My implementation in this PR is very draft and mostly for demo purpose.

The final goal is to have all functions being "cpp-first", then having c-only llama.h as their wrapper (NOTE: this is reverse to what we do with common. Currently, the cpp common is wrapper for c functions in llama.h)

I'm opening this proposal for discussion, kindly invite @slaren and @ggerganov to join in.

Thank you.

I have read the contributing guidelines
Self-reported review complexity:
- Low

slaren · 2024-11-29T18:13:44Z

I like the proposal overall, but I am concerned that if we do this, first, we would need to maintain two different APIs and commit to avoid breaking changes unless absolutely necessary. This would increase the maintenance cost. Second, there is a significant risk that the C API would become a second class citizen, which is something that we must absolutely avoid because it is the only way to have a stable ABI that can be used from other languages.

ngxson · 2024-11-29T20:39:56Z

What I'm thinking is that when looking at the big picture, it's not adding too much maintenance cost, because:

The underlay code of most functions (i.e. llama_*_impl and llama_*_internal) already use cpp types, and most public API/ABI functions are indeed a layer on top that converts cpp --> c
We already doing cpp <--> c conversion in common library, which a bit costly itself (both in term of maintenance and performance). By exposing directly these cpp definition, we can eliminate the need of maintaining some functions that does cpp --> c --> cpp

In the long term, I think it will be cleaner to separate the library into 2 layers:

the core that provide cpp functions
a small wrapper outside that converts cpp --> c to be used for FFI binding

This will effectively make c library to be a second-class citizen. But I think this is somewhat acceptable. My point is that:

The cpp --> c conversion must be just a thin wrapper, so the cost to maintain it should be small (we can also use some macro to make it even simpler to maintain)
Even though the ABI is in c-style, the actual function will take the same shape as the one found in llama-cpp.h

Let me take an example, we have tokenize function in cpp:

std::vector<llama_token> tokenize(
        const llama_cpp::model & model,
             const std::string & raw_text,
                          bool   add_special,
                          bool   parse_special = false);

Which will then be wrapped into c:

int32_t llama_tokenize(
    const struct llama_model * model,
                  const char * text,
                     int32_t   text_len,
                 llama_token * tokens,
                     int32_t   n_tokens_max,
                        bool   add_special,
                        bool   parse_special);

But when exposed to a high-level language like python, end-user expect the function to take this shape:

tokenize(
    model: LlamaModel,
    raw_text: str,
    add_special: bool,
    parse_special: bool) -> List[LlamaToken]

So having cpp function as a reference point should give a more "universal" experience in this case.

ggerganov · 2024-12-01T10:59:39Z

The C++ API is only usable by C++ and nothing else. The C-style API is universal and can be used by any language, so it has to be the first-class citizen. The examples must use the C-style interface in order to exercise it, even if it were a very thin layer on top of a C++ interface.

Currently, one of the main challenge for developers who want to adopt llama.cpp is that we don't have cpp functions exposed in llama.h (ref: #8074 (reply in thread))

We cannot draw conclusions based just on this discussion, so I'm not convinced that this is the main challenge atm.

We can keep extending llama-cpp.h to make it more friendly to write C++. Ideally, common would not exist and everything would be using llama.h (+ llama-cpp.h). The reason to have common is to reduce the boilerplate. And I also see it as a staging ground for new features (like the grammar, samplers, speculative decoding, etc.), which we are not sure how to directly put in llama and need to iterate a bit before doing so.

There are various inefficiencies in the existing C-style API, like the tokenization that you pointed out, but we can improve those.

Also, another advantage of not having the internal llama.cpp C++ API exposed is that we can majorly refactor it at anytime and improve the internal implementation (which I agree is needed) without any breaking changes.

For llamax I am thinking that it would also expose a C-style interface. It would be higher-level in the logic, not in the language implementation. Although, I admit that haven't thought deeply yet what it would like like.

ngxson · 2024-12-02T10:15:53Z

Hmm ok, that make sense, so I think for now we can keep c-style public API as first-class, and keep cpp function as internal-only.

The same can be done with llamax. Although, I'm a bit concern that because llamax is built on top of llama.h, it should ideally use directly the cpp funtions provided by llama.c, instead of doing cpp (llama.cpp) --> c (llama.h) --> cpp (llamax.cpp) --> c (llamax.h).

We can keep extending llama-cpp.h to make it more friendly to write C++. Ideally, common would not exist and everything would be using llama.h (+ llama-cpp.h).

I believe that extending llama-cpp.h will greatly reduce the development complexity for llamax. Even better, we can engineer llama-cpp.h in a way that it calls directly cpp internal function, so we can eliminate the conversion cpp (llama.cpp) --> c (llama.h)

In either way, I think having cpp and c-style wrapper separated in llama.cpp will be cleaner. Currently, some cpp functions like llama_token_to_piece_impl does not take advantage of cpp style (i.e. output buffer is passed as const char * instead of std::string or std::vector<char>). Do you think we should refactor them?

add more function into llama-cpp.h

98e6651

github-actions bot added the examples label Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : (proposal) implement cpp-first library `llama-cpp.h` #10583

llama : (proposal) implement cpp-first library `llama-cpp.h` #10583

ngxson commented Nov 29, 2024 •

edited

Loading

slaren commented Nov 29, 2024

ngxson commented Nov 29, 2024 •

edited

Loading

ggerganov commented Dec 1, 2024

ngxson commented Dec 2, 2024

llama : (proposal) implement cpp-first library llama-cpp.h #10583

Are you sure you want to change the base?

llama : (proposal) implement cpp-first library llama-cpp.h #10583

Conversation

ngxson commented Nov 29, 2024 • edited Loading

Motivation

Implementation

slaren commented Nov 29, 2024

ngxson commented Nov 29, 2024 • edited Loading

ggerganov commented Dec 1, 2024

ngxson commented Dec 2, 2024

llama : (proposal) implement cpp-first library `llama-cpp.h` #10583

llama : (proposal) implement cpp-first library `llama-cpp.h` #10583

ngxson commented Nov 29, 2024 •

edited

Loading

ngxson commented Nov 29, 2024 •

edited

Loading