RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

AAAZSF · 2023-04-07T08:17:22Z

I successfully finetune model when using int8 on multi gpu by model parallelism But when i set load_in_8bit=False，finetune model with fp16 or fp32, it cause RuntimeError:

  File "/home/usr/project/alpaca-lora/finetune.py", line 288, in <module>
    fire.Fire(train)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/usr/project/alpaca-lora/finetune.py", line 255, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/peft_model.py", line 530, in forward
    return self.base_model(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/media/data/6/usr/tmp/transformers/src/transformers/models/llama/modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/peft/tuners/lora.py", line 350, in forward
    result += self.lora_B(self.lora_A(self.lora_dropout(x))) * self.scaling
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/.conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

I hope for help, thanks!

Can you fit fp16 model in your VRAM? It seems you don't have enough vram and some layers are put on cpu.

Sorry, I forgot to say that I set load_in_8bit=False in the 7b model.
I test 7b fp16 model on 2*24G gpus, so i think memory is enough.
more detail during runing:
nvidia-smi

>>> model.hf_device_map
{'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0, 'model.layers.4': 0, 'model.layers.5': 0, 'model.layers.6': 0, 'model.layers.7': 0, 'model.layers.8': 0, 'model.layers.9': 0, 'model.layers.10': 0, 'model.layers.11': 0, 'model.layers.12': 0, 'model.layers.13': 0, 'model.layers.14': 0, 'model.layers.15': 0, 'model.layers.16': 1, 'model.layers.17': 1, 'model.layers.18': 1, 'model.layers.19': 1, 'model.layers.20': 1, 'model.layers.21': 1, 'model.layers.22': 1, 'model.layers.23': 1, 'model.layers.24': 1, 'model.layers.25': 1, 'model.layers.26': 1, 'model.layers.27': 1, 'model.layers.28': 1, 'model.layers.29': 1, 'model.layers.30': 1, 'model.layers.31': 1, 'model.norm': 1, 'lm_head': 1}

Originally posted by @AAAZSF in #131 (comment)

The text was updated successfully, but these errors were encountered:

AAAZSF · 2023-04-07T08:19:15Z

So, I want know how to train fp16 or fp32 model on muti-gpu.

han508 · 2023-05-03T07:26:35Z

i meet the same error ,Have you solved it yet

pyogher · 2023-05-22T08:29:42Z

i meet the same error ,Have you solved it yet
Typically, ensuring that both the params of lm_head and embed_tokens are on the same device should resolve the issue. This approach has worked for me in similar situations. I hope it could assist you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

AAAZSF commented Apr 7, 2023 •

edited

Loading

AAAZSF commented Apr 7, 2023

han508 commented May 3, 2023

pyogher commented May 22, 2023

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #287

Comments

AAAZSF commented Apr 7, 2023 • edited Loading

AAAZSF commented Apr 7, 2023

han508 commented May 3, 2023

pyogher commented May 22, 2023

AAAZSF commented Apr 7, 2023 •

edited

Loading