-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] DeepSeek-R1 mode load stuck in H20 #50975
Comments
Add-on: |
The Log
|
serve run serve-r1:build_app model="/data/model/DeepSeek-R1" pipeline-parallel-size=2 tensor-parallel-size=8 accelerator="GPU" the stuck front log
pid=154590
pid=155233
seems to be socket connecting why vllm have no such problem |
pg_resources.append({"GPU": 1, accelerator: 1}) should be pg_resources.append({"CPU": 1, accelerator: 1}), but in 2 node and pp = 2 problem remains, stucking |
@kouroshHakha i've update the info above about this problem, may be socket error ? |
sometimes it stuck after load all models
|
What happened + What you expected to happen
Env
using vLLM 0.7.2 model DeepSeek-R1 671b , cuda
nvidia-smi
start
problem
serve log info: load model using triton MLA then hang.... that's the problem, have no problem in vLLM 0.6.5, but only 0.7.2 support the DeepSeekR1, so i turn to the 0.7.2 version .
Versions / Dependencies
ray --version
2025-02-27 19:19:42,359 - INFO - Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2025-02-27 19:19:42,359 - INFO - Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2025-02-27 19:19:42,359 - INFO - NumExpr defaulting to 8 threads.
ray, version 2.40.0
vllm 0.7.2
nvcc:
NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Reproduction script
code serve-r1.py
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: