-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! error for chatcompletion.py with llama 3.2 instruct model #771
Comments
by the way, the script worked fine with llama 3 8b instruct, I assume the model matters |
@Emersonksc Thanks for your report on this bug. However I can not reproduce this, can you double check you llama-recipe version? Here is the log, please take a look:
|
I found when I used the single command, it worked fine, but added the export CUDA_VISIBLE_DEVICES=1, it reported the error. |
maybe you missed export CUDA_VISIBLE_DEVICES=1 |
System Info
ubuntu 22.04
torch 2.5.0
cuda 12.4
running on a single gpu with CUDA_VISIBLE_DEVICES=1
Information
🐛 Describe the bug
python recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py --model_name "/home/emerson/AI/LLM/models/llama/Llama-3.2-3B-Instruct" --prompt_file "recipes/quickstart/inference/local_inference/chat_completion/girlfriend_chat_completion.json" --max_new_tokens 20 --enable_saleforce_content_safety False
Error logs
error:
File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 141, in
fire.Fire(main)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/emerson/AI/LLM/recipe/llama-recipes/recipes/quickstart/inference/local_inference/chat_completion/chat_completion.py", line 107, in main
outputs = model.generate(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
result = self._sample(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1190, in forward
outputs = self.model(
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 921, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/emerson/miniconda3/envs/llama-recipes/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 158, in forward
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
Expected behavior
run chat_completion.py with llama3.2 instruct models
The text was updated successfully, but these errors were encountered: