Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: vLLM was installed and used without issues, but recently, during more frequent usage, it suddenly throws an error on a particular request and stops working entirely. Even nvidia-smi cannot return any output. The log is as follows: #9321

Open
1 task done
alexchenyu opened this issue Oct 13, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@alexchenyu
Copy link

Your current environment

The output of `python collect_env.py`
when fail python collect_env.py
Collecting environment information...
fail

Model Input Dumps

761, 286, 330, 3628, 5857, 794, 330, 4963, 6125, 8318, 6046, 11, 6125, 8318, 21792, 11, 49554, 6046, 11, 49554, 21792, 220, 4393, 61198, 69, 285, 372, 55397, 5401, 3775, 2062, 21170, 7832, 3443, 16, 28337, 220, 323, 1522, 2062, 21170, 7832, 3443, 17, 28337, 6360, 262, 457, 22414, 128009, 128006, 78191, 128007, 271], lora_request: None, prompt_adapter_request: None.
WARNING 10-12 22:20:45 preprocess.py:86] Falling back on for decoder start token id because decoder start token id is not available.
INFO 10-12 22:20:45 engine.py:288] Added request chat-aa6539ac536f45d1b61db2a663c09dae.
INFO 10-12 22:20:49 metrics.py:351] Avg prompt throughput: 2834.0 tokens/s, Avg generation throughput: 8.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 17.9%, CPU KV cache usage: 0.0%.
INFO 10-12 22:20:54 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 21.9 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.0%, CPU KV cache usage: 0.0%.
INFO 10-12 22:20:59 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 25.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.2%, CPU KV cache usage: 0.0%.
INFO 10-12 22:21:04 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 26.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.3%, CPU KV cache usage: 0.0%.
INFO 10-12 22:21:09 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 27.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.5%, CPU KV cache usage: 0.0%.
INFO 10-12 22:21:14 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 27.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.7%, CPU KV cache usage: 0.0%.
INFO 10-12 22:21:19 metrics.py:351] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 27.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 18.9%, CPU KV cache usage: 0.0%.
ERROR 10-12 22:21:31 client.py:244] TimeoutError('No heartbeat received from MQLLMEngine')
ERROR 10-12 22:21:31 client.py:244] NoneType: None
CRITICAL 10-12 22:21:34 launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO: 172.31.57.89:54752 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [3028]
(VllmWorkerProcess pid=3106) WARNING 10-12 22:22:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:22:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:22:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:23:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:23:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:23:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:24:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:24:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:24:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:25:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:25:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:25:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:26:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:26:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:26:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:27:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:27:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:27:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:28:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:28:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:28:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:29:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:29:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:29:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:30:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:30:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:30:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3106) WARNING 10-12 22:31:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3108) WARNING 10-12 22:31:22 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=3107) WARNING 10-12 22:31:22 shm_broadcast.py:396] No available block found in 60 second.
[rank3]:[E1012 22:31:22.267261276 ProcessGroupNCCL.cpp:607] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600020 milliseconds before timing out.
[rank3]:[E1012 22:31:22.268214341 ProcessGroupNCCL.cpp:1664] [PG 3 Rank 3] Exception (either an error or timeout) detected by watchdog at work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank3]:[E1012 22:31:22.268259040 ProcessGroupNCCL.cpp:1709] [PG 3 Rank 3] Timeout at NCCL work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank3]:[E1012 22:31:22.268279156 ProcessGroupNCCL.cpp:621] [Rank 3] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank3]:[E1012 22:31:22.268292992 ProcessGroupNCCL.cpp:627] [Rank 3] To avoid data inconsistency, we are taking the entire process down.
[rank3]:[E1012 22:31:22.273984384 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 3] Process group watchdog thread terminated with exception: [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600020 milliseconds before timing out.
Exception raised from checkTimeout at /opt/conda/conda-bld/pytorch_1720538459595/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:609 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc7dd776f86 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1d2 (0x7fc787deea42 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x233 (0x7fc787df5483 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc787df786c in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xdbbf4 (0x7fc7d82dbbf4 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #5: + 0x9ca94 (0x7fc7e169ca94 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: + 0x129c3c (0x7fc7e1729c3c in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[E1012 22:31:22.300408712 ProcessGroupNCCL.cpp:607] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600054 milliseconds before timing out.
[rank1]:[E1012 22:31:22.300582170 ProcessGroupNCCL.cpp:1664] [PG 3 Rank 1] Exception (either an error or timeout) detected by watchdog at work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank1]:[E1012 22:31:22.300606450 ProcessGroupNCCL.cpp:1709] [PG 3 Rank 1] Timeout at NCCL work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank1]:[E1012 22:31:22.300627835 ProcessGroupNCCL.cpp:621] [Rank 1] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank1]:[E1012 22:31:22.300642330 ProcessGroupNCCL.cpp:627] [Rank 1] To avoid data inconsistency, we are taking the entire process down.
[rank1]:[E1012 22:31:23.305921310 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 1] Process group watchdog thread terminated with exception: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600054 milliseconds before timing out.
Exception raised from checkTimeout at /opt/conda/conda-bld/pytorch_1720538459595/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:609 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc7dd776f86 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1d2 (0x7fc787deea42 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x233 (0x7fc787df5483 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc787df786c in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xdbbf4 (0x7fc7d82dbbf4 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #5: + 0x9ca94 (0x7fc7e169ca94 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: + 0x129c3c (0x7fc7e1729c3c in /lib/x86_64-linux-gnu/libc.so.6)

[rank2]:[E1012 22:31:23.331871748 ProcessGroupNCCL.cpp:607] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600085 milliseconds before timing out.
[rank2]:[E1012 22:31:23.332127235 ProcessGroupNCCL.cpp:1664] [PG 3 Rank 2] Exception (either an error or timeout) detected by watchdog at work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank2]:[E1012 22:31:23.332166455 ProcessGroupNCCL.cpp:1709] [PG 3 Rank 2] Timeout at NCCL work: 203441, last enqueued NCCL work: 203441, last completed NCCL work: 203440.
[rank2]:[E1012 22:31:23.332195576 ProcessGroupNCCL.cpp:621] [Rank 2] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[rank2]:[E1012 22:31:23.332219360 ProcessGroupNCCL.cpp:627] [Rank 2] To avoid data inconsistency, we are taking the entire process down.
[rank2]:[E1012 22:31:23.338983581 ProcessGroupNCCL.cpp:1515] [PG 3 Rank 2] Process group watchdog thread terminated with exception: [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=203441, OpType=GATHER, NumelIn=32064, NumelOut=0, Timeout(ms)=600000) ran for 600085 milliseconds before timing out.
Exception raised from checkTimeout at /opt/conda/conda-bld/pytorch_1720538459595/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:609 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7fc7dd776f86 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10d::ProcessGroupNCCL::WorkNCCL::checkTimeout(std::optional<std::chrono::duration<long, std::ratio<1l, 1000l> > >) + 0x1d2 (0x7fc787deea42 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #2: c10d::ProcessGroupNCCL::watchdogHandler() + 0x233 (0x7fc787df5483 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #3: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fc787df786c in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: + 0xdbbf4 (0x7fc7d82dbbf4 in /home/alex/miniconda3/envs/vllm-env/lib/python3.10/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #5: + 0x9ca94 (0x7fc7e169ca94 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: + 0x129c3c (0x7fc7e1729c3c in /lib/x86_64-linux-gnu/libc.so.6)

🐛 Describe the bug

vLLM was installed and used without issues, but recently, during more frequent usage, it suddenly throws an error on a particular request and stops working entirely. Even nvidia-smi cannot return any output.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@alexchenyu alexchenyu added the bug Something isn't working label Oct 13, 2024
@hpx502766238
Copy link

hpx502766238 commented Oct 15, 2024

same issue,in version 0.6.3post1,I used the python openai client,the error
TimeoutError('No heartbeat received from MQLLMEngine')
occured only with 2 request,at the same time a error was shown in openai client:
openai.APIError: could not broadcast input array from shape (513,) into shape (512,)

@MehrdadKhnzd
Copy link

I'm receiving the same error. After about ~1000 requests in the span of ~20 minutes, the server crashed with the same message, and subsequently with another message.

INFO:     "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
CRITICAL  launcher.py:99] MQLLMEngine is already dead, terminating server process
INFO:     "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR     engine.py:135] RuntimeError('Error in model execution (input dumped to /tmp/err_execute_model_input_20250104-024906.pkl): stack expects each tensor to be equal size, but got [24] at entry 0 and [64] at entry 1')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants