-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to apply lora adapter after fine-tuning a llama-2-13B-chat with ./finetune #4499
Comments
Can you test if the lora works with #3333? |
Hi slaren, I tried it and it moved forward, now I am receiving this: ggml_new_object: not enough space in the context's memory pool (needed 983635488, available 849346560) in the meanwhile I'm trying converting the model to fp16, quantise it to q4_0 and retraining it until the first checkpoint (it usually takes more or less 1h) Thank you for your support! below the full output right after the beginning of loading lora (where my previous attempts stopped): |
Thanks for testing! That issue should be fixed now, the loras created by |
it worked! thanks slaren, I hope it's merged as soon as possible with the main branch appreciated your fast response and solution! Luca |
Hi everybody,
I am trying to fine-tune a llama-2-13B-chat model and I think I did everything correctly but I still cannot apply my lora.
What I did was:
python convert_llama_weights_to_hf.py --input_dir ../models/llama-2-13b-chat --output_dir ../models/llama-2-13b-chat/llama-2-13b-chat-hf --model_size 13B
python convert.py ../models/llama-2-13b-chat/llama-2-13b-chat-hf --outtype f32 --outfile ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin
./quantize ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin q5_k_m
At this point I test the models with ./main and they work perfectly.
(I didn't put any </s> at the end because for some reason the loss became nan after less than 10 iterations)
./finetune --model-base ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin --train-data ../datasets/FineTune/train_llamacpp.txt --threads 26 --sample-start "<s>" --ctx 512 -ngl 32
./main -i -m ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin --lora-base ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin --lora ../models/llama-2-13b-chat/ggml-lora-LATEST-f32.gguf --color -p "What is entanglement in physics?"
I always get as last lines of the logs
.....
llama_apply_lora_from_file_internal: unsupported tensor dimension 1
llama_init_from_gpt_params: error: failed to apply lora adapter
ggml_metal_free: deallocating
main: error: unable to load model
I am running everything on an M1 Max with 64GB of Ram and 1 GPU with 32 cores
What can be the problem? I tried already different things but no success, that's why I'm writing here...
Thank you for any help
Luca
The text was updated successfully, but these errors were encountered: