-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a C-style API similar to whisper.cpp #77
Conversation
fb6a512
to
bb0600c
Compare
In my fork I added this struct to bundle up all the relevant data: struct llama_state {
gpt_vocab vocab;
llama_model model;
struct {
int64_t t_load_us = -1;
int64_t t_sample_us = -1;
int64_t t_predict_us = -1;
} timing;
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a step in the right direction, but the exposed things are not the right one.
The llama_layer
and llama_model
should not be publicly visible.
You have to wrap them in llama_context
or llama_state
, which is just forward declared in llama.h
file and defined in llama.cpp
.
See the whisper.cpp
C-style API for doing it the correct way:
https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h
If you give it another try, make sure to start from latest master
since things are changing there.
e463b4f
to
3a561bb
Compare
@ggerganov I have made the changes. Please let me know what you think |
c9904e5
to
6ff3e64
Compare
CMakeLists.txt
Outdated
add_executable(llama | ||
main.cpp | ||
llama.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llamalib already contains llama.cpp, utils.cpp and utils.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
CMakeLists.txt
Outdated
@@ -114,6 +119,9 @@ add_executable(quantize | |||
utils.cpp | |||
utils.h) | |||
|
|||
add_library(llamalib | |||
llama.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing llama.h, utils.cpp and utils.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated CMakelists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback on the API:
llama.h
Outdated
bool llama_init_context_with_prompt(llama_context& ctx, const std::string& text, bool clear_existing = true); | ||
|
||
// Various functions for loading a ggml LLaMA model. | ||
llama_context* llama_init_from_params(const gpt_params& params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does llama_init_context_with_prompt
take a llama_context&
while llama_init_from_params
returns an llama_context*
? Can you make these have a similar API, or rename them to clarify how they differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this confusion in my second pass of refactoring. I feel it is a lot cleaner now. Please take a look.
6ff3e64
to
bd4476d
Compare
@j-f1 @Green-Sky @ggerganov I have done another pass at refactoring and also fixed a few logical bugs that left interactive mode broken in my original version (among other things). I have verified that interactive mode now works as intended and inference remains just as fast as before. I have also rebased on to the latest master branch. Please take another look. Thanks! |
41b6af6
to
71f75c1
Compare
@thomasantony For now, leave it like this and let me apply the necessary changes on top of yours to demonstrate what I have in mind - probably tomorrow or the day after. |
Okay. Thanks. In the meantime, I will rebase the new changes on the master branch on to this branch. |
f609ff4
to
5a5d552
Compare
5a5d552
to
5b9577f
Compare
1cb574c
to
f0aea33
Compare
- Also single token converter
- executable is now "main" and library is "llama"
f0aea33
to
5195fed
Compare
Superseded by #370 |
Update README.md: formate output samples
This change makes it easier to use this code as a library - say to build python bindings on top of it. It extracts out the following functions into
llama.cpp
llama_model_load
llama_eval
llama_model_quantize
It also moves the relevant struct definitions to
llama.h
. This for example, helps avoid redefinition ofllama_hparams
inquantize.cpp
. Please let me know if you have any suggestions to improve this.See here for an example of this library structure in use.