Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a C-style API similar to whisper.cpp #77

Closed

Conversation

thomasantony
Copy link

@thomasantony thomasantony commented Mar 13, 2023

This change makes it easier to use this code as a library - say to build python bindings on top of it. It extracts out the following functions into llama.cpp

  • llama_model_load
  • llama_eval
  • llama_model_quantize

It also moves the relevant struct definitions to llama.h. This for example, helps avoid redefinition of llama_hparams in quantize.cpp. Please let me know if you have any suggestions to improve this.

See here for an example of this library structure in use.

@thomasantony thomasantony force-pushed the refactor_for_library branch from fb6a512 to bb0600c Compare March 13, 2023 04:58
@j-f1
Copy link
Collaborator

j-f1 commented Mar 13, 2023

In my fork I added this struct to bundle up all the relevant data:

struct llama_state {
    gpt_vocab vocab;
    llama_model model;
    struct {
        int64_t t_load_us = -1;
        int64_t t_sample_us = -1;
        int64_t t_predict_us = -1;
    } timing;
};

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a step in the right direction, but the exposed things are not the right one.

The llama_layer and llama_model should not be publicly visible.
You have to wrap them in llama_context or llama_state, which is just forward declared in llama.h file and defined in llama.cpp.

See the whisper.cpp C-style API for doing it the correct way:
https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h

If you give it another try, make sure to start from latest master since things are changing there.

@bakkot bakkot mentioned this pull request Mar 15, 2023
@thomasantony thomasantony force-pushed the refactor_for_library branch 3 times, most recently from e463b4f to 3a561bb Compare March 16, 2023 03:41
@thomasantony
Copy link
Author

@ggerganov I have made the changes. Please let me know what you think

@thomasantony thomasantony force-pushed the refactor_for_library branch 3 times, most recently from c9904e5 to 6ff3e64 Compare March 16, 2023 03:51
CMakeLists.txt Outdated
add_executable(llama
main.cpp
llama.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llamalib already contains llama.cpp, utils.cpp and utils.h

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

CMakeLists.txt Outdated
@@ -114,6 +119,9 @@ add_executable(quantize
utils.cpp
utils.h)

add_library(llamalib
llama.cpp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing llama.h, utils.cpp and utils.h

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated CMakelists.

Copy link
Collaborator

@j-f1 j-f1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback on the API:

llama.h Outdated Show resolved Hide resolved
llama.h Outdated Show resolved Hide resolved
llama.h Outdated Show resolved Hide resolved
llama.h Outdated
bool llama_init_context_with_prompt(llama_context& ctx, const std::string& text, bool clear_existing = true);

// Various functions for loading a ggml LLaMA model.
llama_context* llama_init_from_params(const gpt_params& params);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does llama_init_context_with_prompt take a llama_context& while llama_init_from_params returns an llama_context*? Can you make these have a similar API, or rename them to clarify how they differ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this confusion in my second pass of refactoring. I feel it is a lot cleaner now. Please take a look.

llama.h Outdated Show resolved Hide resolved
llama.h Outdated Show resolved Hide resolved
@thomasantony
Copy link
Author

thomasantony commented Mar 18, 2023

@j-f1 @Green-Sky @ggerganov I have done another pass at refactoring and also fixed a few logical bugs that left interactive mode broken in my original version (among other things). I have verified that interactive mode now works as intended and inference remains just as fast as before.

I have also rebased on to the latest master branch. Please take another look. Thanks!

@thomasantony thomasantony force-pushed the refactor_for_library branch 2 times, most recently from 41b6af6 to 71f75c1 Compare March 18, 2023 03:52
CMakeLists.txt Outdated Show resolved Hide resolved
@ggerganov
Copy link
Owner

@thomasantony
We want to have a C-style API in llama.h. We cannot expose C++ constructs

For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after.
Thanks for the contributing!

@ggerganov ggerganov changed the title Refactors out some of the functions into llama.cpp Create a C-style API similar to whisper.cpp Mar 18, 2023
@thomasantony
Copy link
Author

@thomasantony We want to have a C-style API in llama.h. We cannot expose C++ constructs

For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. Thanks for the contributing!

Okay. Thanks. In the meantime, I will rebase the new changes on the master branch on to this branch.

@thomasantony thomasantony force-pushed the refactor_for_library branch 4 times, most recently from f609ff4 to 5a5d552 Compare March 19, 2023 18:59
@thomasantony thomasantony force-pushed the refactor_for_library branch from 5a5d552 to 5b9577f Compare March 19, 2023 19:05
llama.cpp Outdated Show resolved Hide resolved
@thomasantony thomasantony force-pushed the refactor_for_library branch from 1cb574c to f0aea33 Compare March 19, 2023 20:21
@thomasantony thomasantony force-pushed the refactor_for_library branch from f0aea33 to 5195fed Compare March 19, 2023 20:39
@ggerganov
Copy link
Owner

Superseded by #370

@ggerganov ggerganov closed this Mar 21, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023
Update README.md: formate output samples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority Very important issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants