Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

Open
AAAZSF opened this issue Apr 7, 2023 · 3 comments

Comments

@AAAZSF
Copy link

AAAZSF commented Apr 7, 2023

I successfully finetune model when using int8 on multi gpu by model parallelism But when i set load_in_8bit=False,finetune model with fp16 or fp32, it cause RuntimeError:

  File "/home/usr/project/alpaca-lora/finetune.py", line 288, in <module>
    fire.Fire(train)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/usr/project/alpaca-lora/finetune.py", line 255, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/peft_model.py", line 530, in forward
    return self.base_model(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/tuners/lora.py", line 350, in forward
    result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

I hope for help, thanks!

Can you fit fp16 model in your VRAM? It seems you don't have enough vram and some layers are put on cpu.

Sorry, I forgot to say that I set load_in_8bit=False in the 7b model.
I test 7b fp16 model on 2*24G gpus, so i think memory is enough.
more detail during runing:
nvidia-smi
image

>>> model.hf_device_map
{'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.norm': 1, 'lm_head': 1}

Originally posted by @AAAZSF in #131 (comment)

@AAAZSF
Copy link
Author

AAAZSF commented Apr 7, 2023

So, I want know how to train fp16 or fp32 model on muti-gpu.

@han508
Copy link

han508 commented May 3, 2023

i meet the same error ,Have you solved it yet

@pyogher
Copy link

pyogher commented May 22, 2023

i meet the same error ,Have you solved it yet
Typically, ensuring that both the params of lm_head and embed_tokens are on the same device should resolve the issue. This approach has worked for me in similar situations. I hope it could assist you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants