-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Triton error when initializing LLM(...)
when enable_lora=True
and cuda
device != cuda:0
#12967
Comments
the same! |
Thanks! Unfortunately, I just tried this new snippet following the issue and I have the same issue:
I noticed that, when I use |
Did you use triton backend? |
You can try the following code, it workround for me locally. import torch
from vllm import LLM
print(torch.cuda.current_device()) # output: 0
torch.cuda.set_device("cuda:1") # also tried .set_device(1)
print(torch.cuda.current_device()) # output: 1
llm = LLM(
"deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
device="cuda:0", # or device="cuda"
gpu_memory_utilization=0.5,
dtype="auto",
enable_prefix_caching=True,
max_model_len=4096,
enable_lora=True,
max_lora_rank=16,
)
print(torch.cuda.current_device()) # output: 0 |
This doesn't raise an error but does put the vLLM model on the wrong GPU (the original cuda:0 instead of cuda:1) for me. I found another possible solution here; it seems doable for my own training script, but incorporating it into |
I have found a temporary solution from [this commit](https://huggingface.co/microsoft/Phi-3-small-128k-instruct/commit/ed7de9a074b0760e6cf050fe1d103b90834933c8) and [this discussion](https://huggingface.co/microsoft/Phi-3-small-8k-instruct/discussions/23). It only requires adding |
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
When using the
LLM
class for offline inference with LoRA in a multi-GPU setting, Triton produces an error about being unable to access some pointers.I'm using
vllm
for generation intrl
'sGRPOTrainer
class, and I get this error during the their__init__
method. I've managed to isolate the issue to the following blocks in the Python interpreter. It seems like there are some device issues. Here are two examples, one of which works, and one of which produces the error I described:Works OK:
Does not work:
Console output immediately prior to error
Full stack trace:
Since I'm using
vllm
as part of a training loop intrl
, I'd rather not poke around their device assignment logic if possible.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: