RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

Emersonksc · 2024-11-03T14:33:59Z

System Info

ubuntu 22.04
torch 2.5.0
cuda 12.4
running on a single gpu with CUDA_VISIBLE_DEVICES=1

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name "/home/emerson/AI/LLM/models/llama/Llama-3.2-3B-Instruct" --prompt_file "recipes/quickstart/inference/local_inference/chat_completion/girlfriend_chat_completion.json" --max_new_tokens 20 --enable_saleforce_content_safety False

Error logs

error:
File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 141, in
fire.Fire(main)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 107, in main
outputs = model.generate(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward
outputs = self.model(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 158, in forward
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

Expected behavior

run chat_completion.py with llama3.2 instruct models

The text was updated successfully, but these errors were encountered:

Emersonksc · 2024-11-04T04:13:10Z

by the way, the script worked fine with llama 3 8b instruct, I assume the model matters

wukaixingxp · 2024-11-04T18:57:08Z

@Emersonksc Thanks for your report on this bug. However I can not reproduce this, can you double check you llama-recipe version? Here is the log, please take a look:

~/work/llama-recipes (main)]$ python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name meta-llama/Llama-3.2-3B-Instruct --prompt_file recipes/quickstart/inference/local_inference/chat_completion/  --prompt_file recipes/quickstart/inference/local_inference/chat_completion/chats.json  --max_new_tokens 20 --enable_saleforce_content_safety False
/home/kaiwu/work/llama-recipes/src/llama_recipes/model_checkpointing/checkpoint_handler.py:17: DeprecationWarning: `torch.distributed._shard.checkpoint` will be deprecated, use `torch.distributed.checkpoint` instead
  from torch.distributed._shard.checkpoint import (
User dialogs:
[[{'role': 'user', 'content': 'what is the recipe of mayonnaise?'}], [{'role': 'user', 'content': 'I am going to Paris, what should I see?'}, {'role': 'assistant', 'content': "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."}, {'role': 'user', 'content': 'What is so great about #1?'}], [{'role': 'system', 'content': 'Always answer with Haiku'}, {'role': 'user', 'content': 'I am going to Paris, what should I see?'}], [{'role': 'system', 'content': 'Always answer with emojis'}, {'role': 'user', 'content': 'How to go from Beijing to NY?'}], [{'role': 'system', 'content': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."}, {'role': 'user', 'content': 'Write a brief birthday message to John'}]]

==================================

use_fast_kernelsFalse
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 878/878 [00:00<00:00, 7.18MB/s]
model.safetensors.index.json: 100%|████████████████████████████████████████████████████████████████████| 20.9k/20.9k [00:00<00:00, 60.5MB/s]
model-00001-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████| 4.97G/4.97G [01:57<00:00, 42.2MB/s]
model-00002-of-00002.safetensors: 100%|████████████████████████████████████████████████████████████████| 1.46G/1.46G [00:34<00:00, 42.5MB/s]
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:32<00:00, 76.19s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.54s/it]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 189/189 [00:00<00:00, 1.54MB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 54.5k/54.5k [00:00<00:00, 19.0MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 30.1MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████████████████████████████████| 296/296 [00:00<00:00, 2.43MB/s]
User prompt deemed safe.
User prompt:
 what is the recipe of mayonnaise?

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

user

what is the recipe of mayonnaise?assistant

The classic recipe for mayonnaise is a bit of a tricky process, but

==================================

User prompt deemed safe.
User prompt:
 I am going to Paris, what should I see?

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

user

I am going to Paris, what should I see?assistant

Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world.user

What is so great about #1?assistant

The Eiffel Tower is an iconic symbol of Paris and one of the most

==================================

User prompt deemed safe.
User prompt:
 Always answer with Haiku

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

Always answer with Haikuuser

I am going to Paris, what should I see?assistant

Eiffel Tower high
Louvre's art treasures abide
City's gentle

==================================

User prompt deemed safe.
User prompt:
 Always answer with emojis

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

Always answer with emojisuser

How to go from Beijing to NY?assistant

🗺️🚂✈️:

1. Beijing �

==================================

User prompt deemed safe.
User prompt:
 You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.

==================================

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
User input and model output deemed safe.
Model output:
system

Cutting Knowledge Date: December 2023
Today Date: 04 Nov 2024

You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.user

Write a brief birthday message to Johnassistant

Here's a brief birthday message for John:

"Happy Birthday, John! W

==================================
~/work/llama-recipes (main)]$ pip list | grep llama
llama-recipes                            0.0.4.post1   /home/kaiwu/work/llama-recipes

Emersonksc · 2024-11-06T02:57:21Z

I found when I used the single command, it worked fine, but added the export CUDA_VISIBLE_DEVICES=1, it reported the error.

Emersonksc · 2024-11-06T02:58:42Z

maybe you missed export CUDA_VISIBLE_DEVICES=1

Emersonksc · 2024-11-06T06:32:46Z

@wukaixingxp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

Emersonksc commented Nov 3, 2024

Emersonksc commented Nov 4, 2024

wukaixingxp commented Nov 4, 2024

Emersonksc commented Nov 6, 2024

Emersonksc commented Nov 6, 2024 •

edited

Loading

Emersonksc commented Nov 6, 2024

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771

Comments

Emersonksc commented Nov 3, 2024

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Emersonksc commented Nov 4, 2024

wukaixingxp commented Nov 4, 2024

Emersonksc commented Nov 6, 2024

Emersonksc commented Nov 6, 2024 • edited Loading

Emersonksc commented Nov 6, 2024

Emersonksc commented Nov 6, 2024 •

edited

Loading