lora : improve compat with mergekit-extract-lora
#11131
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
A while ago, I released GGUF-my-LoRA which aims to provide a better playground for users to make even more lora adapters.
However, I soon realized that most users (who have GPU power) still prefer to fine tune the model, instead of making a lora adapter. For example, mradermacher have a huge collection of fine tuned models. Some reasons for which SFT is preferred are:
That made me thinking, can we use
mergekit-extract-loraconvert fine tuned model to lora adapter then use it in llama.cpp?An adapter weights just a fraction of the whole model. Even with a small quality degradation, that's still a bargain!
Idea
mergekit-extract-loraproduces a LoRA adapter by doing matrix decomposition. In the end, it leaves us with an adapter including bothnormvectors andtoken_embdthat we current don't support.Implementation
I made changes to
convert_lora_to_gguf.pyto keep these tensors in the output GGUF.On the
llama.cppside, I added support fortoken_embd.NOTE:
normis present in GGUF, but is not used for now. Adding this should be trivial, but because I will have to modify all thebuild_*functions, which takes me a lot of time, so I decide not to do it now. Also, even without that, most adapters that I tested still works fine.Demo
To make an adapter, install mergekit and run
mergekit-extract-lora, for example:(Note: you can skip this step, download the one of the pre-converted adapters that I made here: https://huggingface.co/collections/ngxson/extracted-lora-mergekit-677d5c3eea0b6a7661201846)
Then, convert it to GGUF
Now use it: