Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

richkcho · 2023-05-29T00:27:26Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[ x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ x ] I carefully followed the README.md.
[ x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ x ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

After merging a lora with a HF model, I can convert it to ggml with convert-pth-to-ggml and observe that the converted model behaves similarly to the original merged model.

Context:

Lora was generated by text-generation-webui
Lora was trained using GPTQ 4-bit training mokeypatch, another one was trained with QLoRA based code (both fail similarly on merge + convert)
Model merge was done using code very similar to this PR

Current Behavior

For some reason, the converted model behaves similarly to the base model without the merge.

Environment and Context

Using latest llama-cpp-python for inference, latest llama.cpp for conversion.

llama.cpp: master-3b126f6
llama-cpp-python: 1.55.0
The rest of the environment is using the dev commit for huggingface libraries, see the qlora blog post
Physical (or virtual) hardware you are using, e.g. for Linux:

AMD Ryzen Threadripper 3970X 32-Core Processor
WSL 2

Operating System, e.g. for Linux:

Linux 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

Python 3.10.9
GNU Make 4.2.1
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Failure Information (for bugs)

See above - the converted model is missing some state.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Train a simple lora via qlora method or GPTQ in textui
Merge lora into original model, see above for script example.
Verify merged model behavior.
Convert model to ggml
Observe ggml has forgotten what the lora has learned.

Failure Logs

Unfortunately, this failure is not in such a way that produces failure logs, it is in model behavior.

The text was updated successfully, but these errors were encountered:

KerfuffleV2 · 2023-05-29T02:38:51Z

I'd guess it's not actually merging the LORA into the existing tensors but instead saving them under a different name or just saving the LORA stuff in the model to be merged when it gets loaded. I don't think there's another way the GGML conversion could have the result you're describing.

The conversion scripts here would probably need to be adapted to look for the correct tensor names in the model (and/or merge if necessary). Or you'd need to merge it in a different way so that it actually ends up like a normal non-LORA model as far as stuff like the tensor names go.

richkcho · 2023-05-29T02:51:19Z

That's my best guess as well, but I have no idea why this would happen or how to understand what I'm looking at when I inspect the models tensors. (I think others have managed to successfully merge loras into base models and convert to ggml) I'm hoping someone already has that knowledge and can chime in as to what I need to do, or what I need to look for.

FNsi · 2023-05-29T12:12:08Z

Maybe try #1531 also see that peft PR

Also, 4q Lora target

check the code

bnb.nn.linear4bit

Try change you Lora.py like the PR mentioned about embedding or make it work.

Btw. If it actually work, you need to load in 4bit to merge it I suppose.

richkcho · 2023-05-29T23:10:01Z

Maybe try #1531 also see that peft PR

that peft PR did not fix the issue here, unfortunately.

Btw. If it actually work, you need to load in 4bit to merge it I suppose.

This will not work, peft will error out in merge and unload. (see this line)

FNsi · 2023-05-29T23:52:48Z

Maybe try #1531 also see that peft PR

that peft PR did not fix the issue here, unfortunately.

Btw. If it actually work, you need to load in 4bit to merge it I suppose.

This will not work, peft will error out in merge and unload. (see this line)

😅then u have to check inside that, like very earlier aplaca script. But I am afraid that may still won't work for q Lora "bnb.nn.linear4bit".

And I guess ggml's lora function is not working for your q Lora too.

richkcho · 2023-05-30T04:01:10Z

And I guess ggml's lora function is not working for your q Lora too.

Strangely enough that does seem to work, which lead me to my next test...

both the merged ggml and the lora DOES work with ./main, but not with llama-cpp-python. So either something is wrong with my python test, my llama-cpp-python install, the shared object or something else. I should've looked into this earlier. Thanks for all the help and suggestions everyone!

richkcho changed the title ~~[User] Insert summary of your issue or enhancement..~~ Merged lora model forgets lora when converted to ggml. May 29, 2023

richkcho closed this as completed May 30, 2023

richkcho changed the title ~~Merged lora model forgets lora when converted to ggml.~~ Merged lora model forgets lora when converted to ggml. (WITH llama-cpp-python, DOES NOT REPRO WITH ./main) May 30, 2023

richkcho changed the title ~~Merged lora model forgets lora when converted to ggml. (WITH llama-cpp-python, DOES NOT REPRO WITH ./main)~~ Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

richkcho commented May 29, 2023 •

edited

Loading

KerfuffleV2 commented May 29, 2023

richkcho commented May 29, 2023

FNsi commented May 29, 2023 •

edited

Loading

richkcho commented May 29, 2023 •

edited

Loading

FNsi commented May 29, 2023 •

edited

Loading

richkcho commented May 30, 2023

Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

Comments

richkcho commented May 29, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

KerfuffleV2 commented May 29, 2023

richkcho commented May 29, 2023

FNsi commented May 29, 2023 • edited Loading

richkcho commented May 29, 2023 • edited Loading

FNsi commented May 29, 2023 • edited Loading

richkcho commented May 30, 2023

richkcho commented May 29, 2023 •

edited

Loading

FNsi commented May 29, 2023 •

edited

Loading

richkcho commented May 29, 2023 •

edited

Loading

FNsi commented May 29, 2023 •

edited

Loading