-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running Vllm on ray cluster, logging stuck at loading #5052
Comments
@simon-mo can i get some help ? |
This setup is a bit hard to debug because while the multi-node setting should work out of the box, the hanging could be due to connectivity issue between two machines. |
thanks for your response @simon-mo
what could make the setup not suitable for the application ? |
We have added documentation for this situation in #5430. Please take a look. |
Chiming in to say that I'm seeing the same issue. I have an instance of However, the application itself continues to run fine. I cannot see logs after that point, but it produces file output to s3, and other instrumentation (datadog, in my case) continues to run and report results. This appears to be purely a logging issue, and manifests during the model loading process. Here is some example output from my ray app that looks very similar to @maherr13 's:
You can see logs reported from other task/actor processes earlier in the flow, and then once I know this is an issue vllm specifically, because replacing it for a dummy implementation in My best guess here was that it has something to do with vllm's |
Folks, I have confirmed that vllm's custom Setting |
Your current environment
I have two machine 2*4090, I wanted to runner a model (eg gpt-neox-20b) using vllm on ray cluster, so i follow the documentation by making ray cluster
on head
ray start --head
on node
ray start --address=:port
I manged to make the cluster so far, when i run simple script for inference:
the model is stuck at loading.

nvidia-smi for the head and the node while loading
logs when running the script

versions
cuda : 12.4
ray : 2.22
🐛 Describe the bug
i tried other solutions mentioned in the issues like
export NCCL_P2P_DISABLE = 1
disable_custom_all_reduce=True
enforce_eager=True
and its not solved
The text was updated successfully, but these errors were encountered: