Add llama compatibility with new ggml quantization #642

niansa · 2023-05-19T13:30:21Z

Describe your changes

This change introduces full compatibility with new ggml quanitzation without killing the old one (which is renamed to {llama,ggml}-old).
The API is be kept unchanged and these changes completely invisible to it.

Issue ticket number and link

Every single one that complains about new llama models not working :-)

niansa · 2023-05-19T17:53:58Z

Closes #541 once dlopen_backend is merged into main

…e as llama.cpp

kuvaus · 2023-05-19T19:57:21Z

Closes #541 once dlopen_backend is merged into main

Nice! Thank you! :)

I looked at the code and looks like the temperature sampling is the same. You added tfs_z typical_p but hardcoded them to 1.0 = disabled. This is probably better because at some point it would be worth to add typical_p, tfs_z and other parameters to the prompt_context. But lets just get this running first

niansa · 2023-05-19T20:17:09Z

Closes #541 once dlopen_backend is merged into main

Nice! Thank you! :)

I looked at the code and looks like the temperature sampling is the same. You added tfs_z typical_p but hardcoded them to 1.0 = disabled. This is probably better because at some point it would be worth to add typical_p, tfs_z and other parameters to the prompt_context. But lets just get this running first

That's why I put them into the gpt_params for now! It's something for a later PR to actually expose it.

niansa · 2023-05-19T20:17:34Z

Looks like ggerganov made another breaking change.

niansa · 2023-05-19T21:24:41Z

Thanks to @imaami for helping with the commit history :-)

imaami · 2023-05-19T21:33:36Z

Closes #541 once dlopen_backend is merged into main

Nice! Thank you! :)

I looked at the code and looks like the temperature sampling is the same. You added tfs_z typical_p but hardcoded them to 1.0 = disabled. This is probably better because at some point it would be worth to add typical_p, tfs_z and other parameters to the prompt_context. But lets just get this running first

I did something like this, or at least I hope. If you're interested: imaami@fea4747

niansa · 2023-05-20T09:41:30Z

I did something like this, or at least I hope. If you're interested: imaami@fea4747

Please feel free to PR this once this whole thing is merged into master!

We should add as few unrelated things as possible to the dlopen_backend branch.

kuvaus · 2023-05-20T10:59:51Z

I did something like this, or at least I hope. If you're interested: imaami@fea4747

This is great! I like it. :)
@manyoso will likely want you to update the set(LLMODEL_VERSION_PATCH 2) in gpt4all-backend/CMakeLists.txt in imaami@fea4747 since your changes add to the llmodel_c.h interface and bindings developers now have access to extra features.

I'll close #541 as this #642 implementation is better.

imaami · 2023-05-20T12:14:55Z

I did something like this, or at least I hope. If you're interested: imaami@fea4747

Please feel free to PR this once this whole thing is merged into master!

We should add as few unrelated things as possible to the dlopen_backend branch.

Sure thing, let's do things in order.

gpt4all-backend/llamamodel.cpp

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>

manyoso

Please address comments...

manyoso · 2023-05-20T21:41:07Z

gpt4all-backend/CMakeLists.txt

@@ -71,18 +73,32 @@ foreach(BUILD_VARIANT IN LISTS BUILD_VARIANTS)
                     PROPERTY INTERPROCEDURAL_OPTIMIZATION ${IPO_SUPPORTED})
    endfunction()

-    # Add each individual implementation
-    add_library(llamamodel-${BUILD_VARIANT} SHARED
+    # Add each individual implementations


nitpick, you don't want the plural here

I noticed that as well, but decided to leave it as is since it's not worth a commit. Will batch this with further things that may come up.

manyoso · 2023-05-20T21:41:23Z

.gitmodules

 	url = https://github.com/manyoso/llama.cpp.git
+[submodule "llama.cpp-mainline"]
+	path = gpt4all-backend/llama.cpp-mainline
+	url = https://github.com/ggerganov/llama.cpp.git


Ok, ok, i get ya, but this isn't actually pinning them. Also, I think I still want all of them to use the 'manyoso' fork as this gives us further control,right?

Not sure what you mean, the manyoso fork hasn't been updated to latest llama.cpp, it's 132 commits behind...

Also that fork only adds alibi, which is only needed for MPT

I mean we should update that fork, and point to it I believe. lemme do that now.

manyoso · 2023-05-20T21:43:10Z

gpt4all-backend/CMakeLists.txt

        llamamodel.cpp)
-    prepare_target(llamamodel llama)
+    target_compile_definitions(llamamodel-mainline-${BUILD_VARIANT} PRIVATE
+        LLAMA_VERSIONS=>=3 LLAMA_DATE=999999)


=>= oh man cmake.. you're kiling me

Look at this line: https://github.com/tuxifan/gpt4all/blob/85ee11faec7f35809c6dff8f9b7563284a4e6996/gpt4all-backend/llamamodel.cpp#L360

Haha, yup. Looks confusing, is confusing, but does what we need quite flexibly.

That conditional should probably be changed to a slightly less cursed variant:

#if LLAMA_VERSION <= 123456 // ... #elif LLAMA_VERSION >= 654321 // ... #endif

At least then it would be a readily recognizable pattern of tragic stylistic compromise instead of a confusing entirely new way to crush one's hopes and dreams. Would also shrink the cmake side a little.

Pardon the gallows humour, can't help it whenever pre-processor macros seem necessary. ;)

manyoso · 2023-05-20T21:44:00Z

gpt4all-backend/CMakeLists.txt


    add_library(gptj-${BUILD_VARIANT} SHARED
        gptj.cpp)
-    prepare_target(gptj ggml)
+    prepare_target(gptj ggml-230511)


wait, where are you tagging the actual ggml with this?

llama.cpp.cmake adds the given suffix to ggml as well.

manyoso · 2023-05-20T21:45:09Z

gpt4all-backend/llamamodel.cpp

    int32_t n_keep        = 0;    // number of tokens to keep from initial prompt
+#if LLAMA_DATE <= 230511
+    int32_t n_parts       = -1;   // amount of model parts (-1 = determine from model dimensions)
+#endif


The crux of it. We're going to use macros...

Our other option would be to have an extensive collection of almost-identical llamamodel.cpp files for different llama.cpp versions.

No, I think this is the right choice of a bunch of bad choices.

There's also CRTP and C++ template magic, but I agree it's not the time to go there yet.

manyoso · 2023-05-20T21:45:55Z

gpt4all-backend/llama.cpp.cmake

        add_library(llama${SUFFIX}
                    ${DIRECTORY}/llama.cpp
                    ${DIRECTORY}/llama.h
-                    ${DIRECTORY}/llama_util.h)
+                    ${DIRECTORY}/${LLAMA_UTIL_SOURCE_FILE})


This branch doesn't actually introduce this file, right? It exists upstream in one of the pinned submodules?

The filename was changed.

manyoso · 2023-05-20T21:46:55Z

gpt4all-backend/llamamodel.cpp

+    llama_sample_top_p(ctx, &candidates_p, top_p, 1);
+    llama_sample_temperature(ctx, &candidates_p, temp);
+    return llama_sample_token(ctx, &candidates_p);
+}


Going to assume this is giving you sane results? Have you made sure to go through and test models with each of the pinned variants and file formats? Man, we almost want regression or unit tests here...

Yup! I did. Man was my harddrive full..

This is also how it's done in the llama.cpp main example.

gpt4all-backend/llamamodel.cpp

redthing1 · 2023-05-22T06:39:08Z

Thank god someone is handling this migration the sane way.

niansa marked this pull request as ready for review May 19, 2023 16:33

niansa changed the title ~~Add compatibility with new ggml quantization~~ Add llama compatibility with new ggml quantization May 19, 2023

niansa requested a review from manyoso May 19, 2023 17:50

Rename llama.cpp fork submodule to llama.cpp-old, add the upstream on…

9a3ccb2

…e as llama.cpp

niansa added 3 commits May 19, 2023 23:02

Re-add llamamodel.cpp based on upstream llama.cpp

863513d

Properly check magic

150135c

Fix BOS never getting added in llamamodel

b4b7bb6

niansa force-pushed the dlopen_llama2 branch from 5dd3012 to 8d929d8 Compare May 19, 2023 20:11

niansa force-pushed the dlopen_llama2 branch from 722d1a5 to b4b7bb6 Compare May 19, 2023 20:58

Aligned llama implementation style to other implementations

8025c20

kuvaus mentioned this pull request May 20, 2023

gpt4all-backend: Add temperature sampling with repetition penalty #541

Closed

niansa and others added 2 commits May 20, 2023 14:30

Support new 230519 llama.cpp

4364e9d

Unified llamamodel{.cpp,_impl.h} for all llama.cpp versions

dae05f8

imaami suggested changes May 20, 2023

View reviewed changes

gpt4all-backend/llamamodel.cpp Outdated Show resolved Hide resolved

Reverted llama_sample_top_p_top_k back to being static

85ee11f

Signed-off-by: niansa/tuxifan <tuxifan@posteo.de>

imaami approved these changes May 20, 2023

View reviewed changes

manyoso requested changes May 20, 2023

View reviewed changes

manyoso merged commit 1e037f8 into nomic-ai:dlopen_backend May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama compatibility with new ggml quantization #642

Add llama compatibility with new ggml quantization #642

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 19, 2023

kuvaus commented May 19, 2023

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 19, 2023

imaami commented May 19, 2023

niansa commented May 20, 2023 •

edited

Loading

kuvaus commented May 20, 2023

imaami commented May 20, 2023

manyoso left a comment

manyoso May 20, 2023

niansa May 21, 2023

manyoso May 20, 2023

niansa May 21, 2023

niansa May 21, 2023

manyoso May 21, 2023

manyoso May 20, 2023

imaami May 20, 2023

niansa May 21, 2023

imaami May 21, 2023 •

edited

Loading

manyoso May 20, 2023

niansa May 21, 2023

manyoso May 20, 2023

niansa May 21, 2023

manyoso May 21, 2023

imaami May 21, 2023

manyoso May 20, 2023

niansa May 21, 2023

manyoso May 20, 2023

niansa May 21, 2023

niansa May 21, 2023

redthing1 commented May 22, 2023

Add llama compatibility with new ggml quantization #642

Add llama compatibility with new ggml quantization #642

Conversation

niansa commented May 19, 2023 • edited Loading

Describe your changes

Issue ticket number and link

niansa commented May 19, 2023

kuvaus commented May 19, 2023

niansa commented May 19, 2023 • edited Loading

niansa commented May 19, 2023 • edited Loading

niansa commented May 19, 2023

imaami commented May 19, 2023

niansa commented May 20, 2023 • edited Loading

kuvaus commented May 20, 2023

imaami commented May 20, 2023

manyoso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imaami May 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redthing1 commented May 22, 2023

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 19, 2023 •

edited

Loading

niansa commented May 20, 2023 •

edited

Loading

imaami May 21, 2023 •

edited

Loading