-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llama compatibility with new ggml quantization #642
Conversation
Closes #541 once |
Nice! Thank you! :) I looked at the code and looks like the temperature sampling is the same. You added |
That's why I put them into the |
Looks like ggerganov made another breaking change. |
Thanks to @imaami for helping with the commit history :-) |
I did something like this, or at least I hope. If you're interested: imaami@fea4747 |
Please feel free to PR this once this whole thing is merged into master! We should add as few unrelated things as possible to the |
This is great! I like it. :) |
Sure thing, let's do things in order. |
Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address comments...
@@ -71,18 +73,32 @@ foreach(BUILD_VARIANT IN LISTS BUILD_VARIANTS) | |||
PROPERTY INTERPROCEDURAL_OPTIMIZATION ${IPO_SUPPORTED}) | |||
endfunction() | |||
|
|||
# Add each individual implementation | |||
add_library(llamamodel-${BUILD_VARIANT} SHARED | |||
# Add each individual implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, you don't want the plural here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that as well, but decided to leave it as is since it's not worth a commit. Will batch this with further things that may come up.
url = https://github.com/manyoso/llama.cpp.git | ||
[submodule "llama.cpp-mainline"] | ||
path = gpt4all-backend/llama.cpp-mainline | ||
url = https://github.com/ggerganov/llama.cpp.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, ok, i get ya, but this isn't actually pinning them. Also, I think I still want all of them to use the 'manyoso' fork as this gives us further control,right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, the manyoso fork hasn't been updated to latest llama.cpp
, it's 132 commits behind...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also that fork only adds alibi, which is only needed for MPT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we should update that fork, and point to it I believe. lemme do that now.
llamamodel.cpp) | ||
prepare_target(llamamodel llama) | ||
target_compile_definitions(llamamodel-mainline-${BUILD_VARIANT} PRIVATE | ||
LLAMA_VERSIONS=>=3 LLAMA_DATE=999999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=>= oh man cmake.. you're kiling me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, yup. Looks confusing, is confusing, but does what we need quite flexibly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That conditional should probably be changed to a slightly less cursed variant:
#if LLAMA_VERSION <= 123456
// ...
#elif LLAMA_VERSION >= 654321
// ...
#endif
At least then it would be a readily recognizable pattern of tragic stylistic compromise instead of a confusing entirely new way to crush one's hopes and dreams. Would also shrink the cmake side a little.
Pardon the gallows humour, can't help it whenever pre-processor macros seem necessary. ;)
|
||
add_library(gptj-${BUILD_VARIANT} SHARED | ||
gptj.cpp) | ||
prepare_target(gptj ggml) | ||
prepare_target(gptj ggml-230511) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, where are you tagging the actual ggml with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llama.cpp.cmake adds the given suffix to ggml as well.
int32_t n_keep = 0; // number of tokens to keep from initial prompt | ||
#if LLAMA_DATE <= 230511 | ||
int32_t n_parts = -1; // amount of model parts (-1 = determine from model dimensions) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crux of it. We're going to use macros...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our other option would be to have an extensive collection of almost-identical llamamodel.cpp
files for different llama.cpp versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think this is the right choice of a bunch of bad choices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also CRTP and C++ template magic, but I agree it's not the time to go there yet.
add_library(llama${SUFFIX} | ||
${DIRECTORY}/llama.cpp | ||
${DIRECTORY}/llama.h | ||
${DIRECTORY}/llama_util.h) | ||
${DIRECTORY}/${LLAMA_UTIL_SOURCE_FILE}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch doesn't actually introduce this file, right? It exists upstream in one of the pinned submodules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename was changed.
llama_sample_top_p(ctx, &candidates_p, top_p, 1); | ||
llama_sample_temperature(ctx, &candidates_p, temp); | ||
return llama_sample_token(ctx, &candidates_p); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to assume this is giving you sane results? Have you made sure to go through and test models with each of the pinned variants and file formats? Man, we almost want regression or unit tests here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I did. Man was my harddrive full..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also how it's done in the llama.cpp main example.
Thank god someone is handling this migration the sane way. |
Describe your changes
This change introduces full compatibility with new ggml quanitzation without killing the old one (which is renamed to
{llama,ggml}-old
).The API is be kept unchanged and these changes completely invisible to it.
Issue ticket number and link
Every single one that complains about new llama models not working :-)