-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU DDP QLora Doesn't Work; Lora Does #921
Comments
I think ultimately the core of the issue is that regular Lora can be on CPU. Accelerate then puts the model where it needs to go. QLora has to be on a GPU and then:
Maybe a way to pass a QLora config would be ideal? |
Hi @mallorbc ! |
I had a similar idea. :) I ultimately did this and it worked: local_rank = os.getenv("LOCAL_RANK")
device_string = "cuda:" + str(local_rank)
kwargs["device_map"] = device_string Thanks so much! BTW, is there a reason we should use float16 over bfloat16 for QLora? |
hi @mallorbc |
@younesbelkada I am now getting this:
Gonna take a closer look at what you shared. |
This error happens with your solution or with my solution. I see that is affecting a certain target lora layer. I have removed it and it moves to another, so far I have removed Trying it again with that removed now. The Qlora paper suggests that tuning all linear layers is important, so this is not ideal. |
The issues I am having deal with gradient checkpointing. With QLora, flash attention, rope scaling, I can get 8k tokens on 1 A100. I can't even get 2k when I disable gradient checkpointing(which I guess is enabled by default?) And I guess for the 7B model it is not enabled my default? |
Ok so normal DDP does not support gradient checkpointing. Thankfully DeepSpeed does and thankfully, all stages but Zero 3 work with QLora(or at least it seems, I need to train a model still but forward and backward work). Thus the answer is to use gradient checkpointing with DeepSpeed and Lora/QLora. |
Hi @mallorbc RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Can you try to install TRL, transformers and PEFT from source and try to pass |
I tried with
it didn't work. Still getting error.
|
which version of TRL / PEFT / transformers do you have? |
as below
|
I change to {'use_reentrant': False}, and it works. |
I am using TRL and PEFT to finetune models. You can see the code I am using here
When finetuning Llama 7B, I use Lora because with A100s I can fit the model in this format and its a bit faster than QLora.
When finetuning 70B, I must use Qlora.
When finetuning with Lora, the code will automatically utilize all available GPUs with DDP. I do not need to use accelerate, though I can and when I do, it works. I imagine there is some stuff going on behind the scenes that is using accelerate
When using QLora, the model does not work, but the behavior changes based on whether or not I am using accelerate explicitly.
When not explicitly using Qlora, the model is loaded onto both GPUs, but one GPU is idle and not ever used. Due to this, the model training never progresses.
We can see in this photo the model is loaded on both but only training is occurring on GPU 0.
When I use accelerate explicitly, I get the following error
The text was updated successfully, but these errors were encountered: