Initial support for llama.cpp #447

thomasantony · 2023-03-20T03:27:32Z

My proof of concept for adding support for llama.cpp. It requires my experimental Python bindings (v0.1.9 and up). This has no dependencies (with some caveats mentioned below).

This is what is in models/llamacpp-7B right now

> ls -al  models/llamacpp-7B/
total 70746800
drwxr-xr-x@ 9 thomas  staff          288 Mar 19 17:59 .
drwxr-xr-x@ 9 thomas  staff          288 Mar  10 22:04 ..
-rw-r--r--@ 1 thomas  staff          100 Mar  10 22:04 checklist.chk
-rw-r--r--  1 thomas  staff          118 Mar 19 18:00 config.json
-rw-r--r--@ 1 thomas  staff  13476939516 Mar  10 22:35 consolidated.00.pth
-rw-r--r--  1 thomas  staff  13477682665 Mar 11 13:40 ggml-model-f16.bin
-rw-r--r--  1 thomas  staff   4212727273 Mar 12 18:39 ggml-model-q4_0.bin
-rw-r--r--  1 thomas  staff   5054995945 Mar 12 19:22 ggml-model-q4_1.bin
-rw-r--r--@ 1 thomas  staff          101 Mar  10 22:03 params.json

Only the ggml-model-q4_0.bin file is required right now (and it is hardcoded). The bigger models should also work as long as the folder names start with llamacpp- or alpaca-cpp-.
The model files can be created from the PyTorch model using the llamacpp-convert and llamacpp-quantize commands that are installed along with the llamacpp. Using these commands requires that torch and sentencepiece be installed as well.
There is currently no option to update the parameters like top_p, top_k etc. other than hardcoding it in llamacpp_model.py. This is on my todo list of things to fix.

oobabooga · 2023-03-20T19:50:01Z

This is really promising.

madmads11 · 2023-03-22T06:34:51Z

I see that llama.cpp has added a C-style API, exciting stuff!

thomasantony · 2023-03-22T06:41:29Z

I see that llama.cpp has added a C-style API, exciting stuff!

Yea. My bindings were based on my own C++ API (#77 which is now closed). Georgi decided that it was too much C++ and wanted a C-style API. I might migrate my python bindings to the new API once it is merged in.

madmads11 · 2023-03-22T06:57:32Z

I see that llama.cpp has added a C-style API, exciting stuff!

Yea. My bindings were based on my own C++ API (#77 which is now closed). Georgi decided that it was too much C++ and wanted a C-style API. I might migrate my python bindings to the new API once it is merged in.

I hope that isn't a set-back to your work. I appreciate all the time you are putting into this project!

TheTerrasque · 2023-03-22T09:08:35Z

There's also this project that might be useful: https://github.com/PotatoSpudowski/fastLLaMa

x-legion · 2023-03-23T11:10:54Z

will it support https://github.com/AlpinDale/pygmalion.cpp

thomasantony · 2023-03-27T17:38:18Z

I see that llama.cpp has added a C-style API, exciting stuff!

Yea. My bindings were based on my own C++ API (#77 which is now closed). Georgi decided that it was too much C++ and wanted a C-style API. I might migrate my python bindings to the new API once it is merged in.

I hope that isn't a set-back to your work. I appreciate all the time you are putting into this project!

His new API is quite a bit cleaner than my previous work which was sort of put together quickly. However, the new one is a bit more minimal, so I need some additional work around it to get it to where it was before. I am currently blocked on a segfault that I can hopefully get to later today or tomorrow.

BadisG · 2023-03-27T17:41:54Z

@thomasantony Will the llama.cpp be placed in the "repositories" folder, similar to "GPTQ-for-LLaMa"? If so, that's great as updating the web-ui will also result in an update of the llama.cpp repository.

oobabooga · 2023-03-30T00:06:58Z

I like the code so far and appreciate that it adheres to the style/structure of the project.

oobabooga · 2023-03-31T17:30:44Z

@thomasantony I have made some changes that made this functional for me. The main parameters are all used: temperature, top_k, top_p, and repetition_penalty.

These were the steps to get it working:

Install version 0.1.10 of llamacpp:

pip install llamacpp==0.1.10

Create the folder models/llamacpp-7b
Put this file in it: ggml-model-q4_0.bin
Start the web UI with

python server.py --model llamacpp-7b

After that it worked.

thomasantony · 2023-03-31T17:38:34Z

Thanks for the changes. I just released v0.1.11 - this includes the new memory mapped I/O feature and requires updating the weight files. But it makes loading the models a whole lot faster (and may allow running models bigger than your RAM but I have not tried it yet and may be wrong about this).

The API should be consistent and work with textui with out any changes.

oobabooga · 2023-03-31T17:40:08Z

Is it possible to find the new weights on hugging face somewhere?

thomasantony · 2023-03-31T17:41:46Z

You can use the updated "llamacpp-convert" script with the original Llama weights (pytorch format) to generate the new ggml weights. Another option is to use the "migrate" script from https://github.com/ggerganov/llama.cpp . That can convert existing "ggml" weights into the new format.

…ation-webui into thomasantony-feature/llamacpp

madmads11 · 2023-03-31T17:59:16Z

Would this support using and interacting with alpaca and llama models of all sizes?

oobabooga · 2023-03-31T18:08:33Z

@thomasantony I did the convertion from the base LLaMA files and that worked.

This was the performance of llama-7b int4 on my i5-12400F:

Output generated in 44.10 seconds (4.53 tokens/s, 200 tokens)

thomasantony · 2023-03-31T18:11:54Z

Well, feel free to merge it! . I am glad that I was able to contribute. :).

oobabooga · 2023-03-31T18:16:49Z

Thank you so much for this brilliant PR, @thomasantony!

The new documentation is here: https://github.com/oobabooga/text-generation-webui/wiki/llama.cpp-models

oobabooga · 2023-03-31T22:23:42Z

@thomasantony I have just noticed that the parameters are not really being used. Assigning to the params variable here doesn't change the parameters inside the model: https://github.com/oobabooga/text-generation-webui/blob/main/modules/llamacpp_model.py#L43

Is the only way to change the model parameters to reload it from scratch like this?

_model = llamacpp.LlamaInference(params)

thomasantony · 2023-03-31T22:37:23Z

@oobabooga That is a side effect of how the underlying Python bindings works right now. Adding support for changing those parameters when sampling from the logits is on my ToDo list. Right now, it is only possible if you use the LlamaContext class (which is more low level), instead of the higher-level LlamaInference which currently does not allow changing the parameters post-initialization. This is probably the next thing I will update in the library. I will post back here once that is done, or make a separate PR with the changes.

niizam · 2023-04-01T22:59:20Z

is it possible to use vram and ram like this? https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model

thomasantony · 2023-04-02T16:35:52Z

@niizam 4-bit quantized models are already supported. You just need to use the appropriate weight files.

Documentation: https://github.com/oobabooga/text-generation-webui/wiki/llama.cpp-models

thomasantony mentioned this pull request Mar 20, 2023

Create json api service ggml-org/llama.cpp#88

Closed

thomasantony force-pushed the feature/llamacpp branch 3 times, most recently from 392d9ca to 0e8dd1d Compare March 20, 2023 05:01

karlwancl mentioned this pull request Mar 20, 2023

Extension: Stable Diffusion Api integration #309

Merged

Loufe mentioned this pull request Mar 20, 2023

[Request] Support for llama.cpp #250

Closed

oobabooga mentioned this pull request Mar 22, 2023

GGML Model Support for LLAMA Alpaca GPT-J #494

Closed

thomasantony force-pushed the feature/llamacpp branch 2 times, most recently from 1c91078 to 6a0fff1 Compare March 29, 2023 20:28

thomasantony marked this pull request as ready for review March 29, 2023 20:28

thomasantony changed the title ~~Draft: Add support for llama.cpp~~ Initial support for llama.cpp Mar 29, 2023

thomasantony added 7 commits March 30, 2023 11:22

Update .gitignore

53ab1e2

Initial version of llamacpp_model.py

7a56248

Add llamacpp to models.py

7745faa

Add to text_generation.py

a5f5736

Add llamacpp to requirements.txt

8953a26

Add support for alpaca

79fa2b6

Update to use new llamacpp API

7fa5d96

thomasantony force-pushed the feature/llamacpp branch from 6a0fff1 to 7fa5d96 Compare March 30, 2023 10:24

General improvements

9d1dcf8

oobabooga added 2 commits March 31, 2023 14:33

Minor changes

4c27562

Merge branch 'main' into feature/llamacpp

4d98623

oobabooga added 2 commits March 31, 2023 14:45

Add repetition_penalty

09b0a3a

Merge branch 'feature/llamacpp' of github.com:thomasantony/text-gener…

ea3ba6f

…ation-webui into thomasantony-feature/llamacpp

Bump llamacpp version

a5c9b7d

oobabooga merged commit 6fd70d0 into oobabooga:main Mar 31, 2023

thomasantony deleted the feature/llamacpp branch March 31, 2023 18:30

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023

Add llama.cpp support (oobabooga#447 from thomasantony/feature/llamacpp)

dcca269

Documentation: https://github.com/oobabooga/text-generation-webui/wiki/llama.cpp-models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for llama.cpp #447

Initial support for llama.cpp #447

thomasantony commented Mar 20, 2023 •

edited

Loading

oobabooga commented Mar 20, 2023

madmads11 commented Mar 22, 2023

thomasantony commented Mar 22, 2023

madmads11 commented Mar 22, 2023

TheTerrasque commented Mar 22, 2023

x-legion commented Mar 23, 2023

thomasantony commented Mar 27, 2023

BadisG commented Mar 27, 2023

oobabooga commented Mar 30, 2023 •

edited

Loading

oobabooga commented Mar 31, 2023 •

edited

Loading

thomasantony commented Mar 31, 2023 •

edited

Loading

oobabooga commented Mar 31, 2023

thomasantony commented Mar 31, 2023 •

edited

Loading

madmads11 commented Mar 31, 2023

oobabooga commented Mar 31, 2023

thomasantony commented Mar 31, 2023 •

edited

Loading

oobabooga commented Mar 31, 2023

oobabooga commented Mar 31, 2023 •

edited

Loading

thomasantony commented Mar 31, 2023

niizam commented Apr 1, 2023

thomasantony commented Apr 2, 2023

Initial support for llama.cpp #447

Initial support for llama.cpp #447

Conversation

thomasantony commented Mar 20, 2023 • edited Loading

oobabooga commented Mar 20, 2023

madmads11 commented Mar 22, 2023

thomasantony commented Mar 22, 2023

madmads11 commented Mar 22, 2023

TheTerrasque commented Mar 22, 2023

x-legion commented Mar 23, 2023

thomasantony commented Mar 27, 2023

BadisG commented Mar 27, 2023

oobabooga commented Mar 30, 2023 • edited Loading

oobabooga commented Mar 31, 2023 • edited Loading

thomasantony commented Mar 31, 2023 • edited Loading

oobabooga commented Mar 31, 2023

thomasantony commented Mar 31, 2023 • edited Loading

madmads11 commented Mar 31, 2023

oobabooga commented Mar 31, 2023

thomasantony commented Mar 31, 2023 • edited Loading

oobabooga commented Mar 31, 2023

oobabooga commented Mar 31, 2023 • edited Loading

thomasantony commented Mar 31, 2023

niizam commented Apr 1, 2023

thomasantony commented Apr 2, 2023

thomasantony commented Mar 20, 2023 •

edited

Loading

oobabooga commented Mar 30, 2023 •

edited

Loading

oobabooga commented Mar 31, 2023 •

edited

Loading

thomasantony commented Mar 31, 2023 •

edited

Loading

thomasantony commented Mar 31, 2023 •

edited

Loading

thomasantony commented Mar 31, 2023 •

edited

Loading

oobabooga commented Mar 31, 2023 •

edited

Loading