You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I successfully finetune model when using int8 on multi gpu by model parallelism But when i set load_in_8bit=False,finetune model with fp16 or fp32, it cause RuntimeError:
File "/home/usr/project/alpaca-lora/finetune.py", line 288, in <module>
fire.Fire(train)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/usr/project/alpaca-lora/finetune.py", line 255, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1636, in train
return inner_training_loop(
File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1903, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2649, in training_step
loss = self.compute_loss(model, inputs)
File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2681, in compute_loss
outputs = model(**inputs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/peft_model.py", line 530, in forward
return self.base_model(
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 765, in forward
outputs = self.model(
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 614, in forward
layer_outputs = decoder_layer(
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 309, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 209, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/tuners/lora.py", line 350, in forward
result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
I hope for help, thanks!
Can you fit fp16 model in your VRAM? It seems you don't have enough vram and some layers are put on cpu.
Sorry, I forgot to say that I set load_in_8bit=False in the 7b model.
I test 7b fp16 model on 2*24G gpus, so i think memory is enough.
more detail during runing: nvidia-smi
i meet the same error ,Have you solved it yet
Typically, ensuring that both the params of lm_head and embed_tokens are on the same device should resolve the issue. This approach has worked for me in similar situations. I hope it could assist you.
I successfully finetune model when using
int8
on multi gpu by model parallelism But when i setload_in_8bit=False
,finetune model with fp16 or fp32, it cause RuntimeError:Sorry, I forgot to say that I set
load_in_8bit=False
in the 7b model.I test 7b fp16 model on 2*24G gpus, so i think memory is enough.
more detail during runing:
nvidia-smi
Originally posted by @AAAZSF in #131 (comment)
The text was updated successfully, but these errors were encountered: