4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b? #1037

sussyboiiii · 2023-04-18T04:47:11Z

Hello,
to reduce my brain usage even more I thought i'd be nice to run AI which is specifically trained to code and thus hopefully make better code than other language models which are trained for e.g. natural language.

So I found this: https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b

I of course wanted to try and run it but there's a problem, there aren't even any pytorch_model files or any 4bit variants listed here: https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md

Thank your for your support!

SlyEcho · 2023-04-18T11:32:19Z

llama.cpp can now load LoRA adapters, you need to convert the LoRA model to ggml using convert-lora-to-ggml.py, then load the original LLaMA 13b as the model and your LoRA model on top of it when launching ./main -m llama-13b.bin --lora lora-model.bin. Something like that.

execveat · 2023-04-18T12:15:29Z

--lora partially addresses the question, but the https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b also mentions a few embeddings that are needed to support custom tokens they use:

<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>

Is there a way to work around this with existing llama.cpp options or would it require a PR?

SlyEcho · 2023-04-18T12:24:19Z

You're right --lora doesn't support extending the tokenizer yet. In that case the model should be saved in the .pth checkpoint and converted from that. llama.cpp itself can load the tokens from the model file.

NoNamedCat · 2023-04-18T12:29:13Z

Can anyone convert this to load this model? I'm particularly interested in this topic of using this models to write and work with code

SlyEcho · 2023-04-18T14:41:01Z

I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd

It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.

However, I'm not sure that the extra tokens are being used for tokenization.

EDIT:

It seems to work even with the text versions of <|prompter|> <|assistant|>...

sussyboiiii · 2023-04-18T16:58:36Z

So by converting the files with the ggml python script we can use gpt4all-alpaca-oa-codealpaca-Lora-13b but not as one file. But your script @SlyEcho can do that?

Edit:
For the llama I have only got the consolidated.00.pth and consolidated.01.pth

SlyEcho · 2023-04-18T19:06:21Z

The script should download 13b from huggingface.co/decapoda-research/llama-13b-hf automatically.

I also tried the --lora adapter and it technically works, but the tokens don't work and it is slower.

sussyboiiii · 2023-04-19T04:45:13Z

Thank you,
your script worked and I now have the .bin shards of the 13b model merged, the thing to do now it to get it to f16 and to 4bit, but which convert.py script do you use? There are different ones.

SlyEcho · 2023-04-19T07:26:21Z

convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0

NoNamedCat · 2023-04-19T07:32:22Z

Can anyone upoad the bin file of this model for using it on llama.cpp?

SlyEcho · 2023-04-19T07:34:46Z

I could but I there is no point because it doesn't work well.

NoNamedCat · 2023-04-19T07:36:59Z

Tks anyway :)

sussyboiiii · 2023-04-19T07:52:08Z

Where can i find this? I can only find the conversion scripts for gpt4all etc.?
Thanks> convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0

SlyEcho · 2023-04-19T07:54:00Z

This one: convert.py

edit: if you are seeing gpt4all conversion scripts, then you may need to do a git pull

sussyboiiii · 2023-04-19T07:59:56Z

Thank you, don't know how I didn't see that.

sussyboiiii · 2023-04-19T10:47:49Z

I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd

It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.

However, I'm not sure that the extra tokens are being used for tokenization.

EDIT:

It seems to work even with the text versions of <|prompter|> <|assistant|>...

I have gotten a vocab size mismatch, how can I fix that?

SlyEcho · 2023-04-19T10:52:36Z

You need to use the vocab files from the jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b repo. convert.py can also read vocab files from another directory, so you can point it to whereever HF downloader wrote the files on your disk, or just download them.

But there are some weird things going on, there are embeddings for 16 new tokens in there, but the JSON only specifies 5. My script also cuts it down to 5 but you may want to hack on this because I don't understand how it's supposed to work.

sussyboiiii · 2023-04-19T11:03:48Z

I have forgotten to put the added_tokens.json in the directory.
Thanks it worked now!

SlyEcho · 2023-04-19T11:21:30Z

If you change main.cpp around line 173 to this it should use the tokens for -ins mode

    // prefix & suffix for instruct mode
    const auto inp_pfx = std::vector<llama_token> { 32002 }; // <|prompter|>
    const auto inp_sfx = std::vector<llama_token> { 32004 }; // <|assistant|>

edit: I think the </s> or EOS token is not needed, after all. without it it works better.

sussyboiiii · 2023-04-19T11:23:05Z

The output I get is also a bit weird, it doesn't want to write code. It wanted me to visit a GitHub repo which doesn't exist.

SlyEcho · 2023-04-19T11:48:39Z

I can recommend other good models that are not LoRA:

chavinlo/alpaca-native 7b model
chavinlo/alpaca-13b
chavinlo/gpt4-x-alpaca 13b, new, I haven't tested much

These can be converted directly with convert.py and used with the instruct mode since they use the same Alpaca prompts.

sussyboiiii · 2023-05-08T09:34:06Z

I believe this has been answered!

* OSX attempt 1 * OSX Pyinstaller * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Add .metal file * Update kcpp-build-release-osx.yaml * Polish Mac

* OSX attempt 1 * OSX Pyinstaller * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Update kcpp-build-release-osx.yaml * Add .metal file * Update kcpp-build-release-osx.yaml * Polish Mac (cherry picked from commit 52cc0da)

sussyboiiii closed this as completed May 8, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b? #1037

4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b? #1037

sussyboiiii commented Apr 18, 2023

SlyEcho commented Apr 18, 2023

execveat commented Apr 18, 2023

SlyEcho commented Apr 18, 2023

NoNamedCat commented Apr 18, 2023

SlyEcho commented Apr 18, 2023 •

edited

Loading

sussyboiiii commented Apr 18, 2023 •

edited

Loading

SlyEcho commented Apr 18, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023

NoNamedCat commented Apr 19, 2023

SlyEcho commented Apr 19, 2023

NoNamedCat commented Apr 19, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading

SlyEcho commented Apr 19, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading

SlyEcho commented Apr 19, 2023

sussyboiiii commented May 8, 2023

4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b? #1037

4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b? #1037

Comments

sussyboiiii commented Apr 18, 2023

SlyEcho commented Apr 18, 2023

execveat commented Apr 18, 2023

SlyEcho commented Apr 18, 2023

NoNamedCat commented Apr 18, 2023

SlyEcho commented Apr 18, 2023 • edited Loading

sussyboiiii commented Apr 18, 2023 • edited Loading

SlyEcho commented Apr 18, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023

NoNamedCat commented Apr 19, 2023

SlyEcho commented Apr 19, 2023

NoNamedCat commented Apr 19, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023 • edited Loading

sussyboiiii commented Apr 19, 2023 • edited Loading

sussyboiiii commented Apr 19, 2023 • edited Loading

SlyEcho commented Apr 19, 2023

sussyboiiii commented Apr 19, 2023

SlyEcho commented Apr 19, 2023 • edited Loading

sussyboiiii commented Apr 19, 2023 • edited Loading

SlyEcho commented Apr 19, 2023

sussyboiiii commented May 8, 2023

SlyEcho commented Apr 18, 2023 •

edited

Loading

sussyboiiii commented Apr 18, 2023 •

edited

Loading

SlyEcho commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading

SlyEcho commented Apr 19, 2023 •

edited

Loading

sussyboiiii commented Apr 19, 2023 •

edited

Loading