-
-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
example config llama-2/lora.yml
fails when load_in_8bit
is set to False
#456
Comments
llama-2/lora.yml
fails when load_in_8bit
is set to `Falsellama-2/lora.yml
fails when load_in_8bit
is set to False
if you want to do a full fine tune, you should set |
Thank you for your answer, @winglian! And yes, sorry, I didn't express myself well. I wanted to train with LoRA attached to a full, non-quantized model to compare against some runs I did with HF directly. I just followed your suggestion (removed the LoRA adapter and it seems I am able to do full fine-tuning on a single 80GB A100?! I am using the And on top of that we are training here on packed examples of length up to 4096, that seems way beyond what I expected 🙂 Definitely need to study the code of the library, amazing. Thank you very much for your answer 🙏 Mhmm BTW full fine-tuning seems to have stopped (0% volatile GPU utilization) but the training didn't crash, just seems to have frozen. Oh well, maybe the examples don't always get packed to full 4096 and just hit a particularly tricky one 🤔 Anyhow, also rechecked for the original issue I raised this bug report for -- it seems I didn't mess anything up, the problem exists when training without quantization and LoRA. Can close it though if you don't feel this is something that doesn't need to be supported, let me know please. Extremely grateful for your help! |
@radekosmulski you're probably on the very edge of what you can do with 80gb. It's usually 12x model size required of VRAM for a full finetune. |
@mhenrichsen thank you very much for your comment, that is very useful to know! 🙂🙏 |
@radekosmulski is this resolved? Can we close it? |
@mhenrichsen it is resolved in the sense that I learned something new and very useful 🙂 so I am extremely grateful for this 🙂 but fine-tuning on unquantized weights with LoRA gives the error as above (somehow the model is not getting moved to the GPU), so assuming one should be able to use LoRA without loading the model in 4 or 8 bits, this is still broken |
Happens too with btlm 3b 8k (cerebras). Is it not possible to attach a LoRa to a fp16 model? |
As a workaround, if you have only one GPU, you can run the script without accelerate, using only python and it should work. |
Please check that this issue hasn't been reported before.
Expected Behavior
I am inside the docker
winglian/axolotl:main-py3.10-cu118-2.0.1
container. GPUs are visible withtorch.cuda.device_count()
I start with the
examples/llama-2/lora.yml
config file. I am able to run it.I want to do full fine-tuning and so I change
load_in_8bit
tofalse
. I am able to train the model.Current behaviour
Currently, the training fails with the following error:
Steps to reproduce
winglian/axolotl:main-py3.10-cu118-2.0.1
examples/llama-2/lora.yml
config file (setload_in_8bit
tofalse
Possible solution
No response
Which Operating Systems are you using?
Python Version
the one in the official docker container
axolotl branch-commit
main/50682a3c068f723de154950b03c3f86bf673e688
Acknowledgements
The text was updated successfully, but these errors were encountered: