大批量请求后服务直接卡死 #1291

yoke233 · 2024-04-12T09:42:54Z

Describe the bug

运行了两个模型 qwen1.5-32b-awq 和 qwen1.5-72b-gptq-int4
多线程并行请求 12线程同时请求 72b , 会把服务器卡死, 任何请求都不接收

To Reproduce

To help us to reproduce this bug, please provide information below:

Your Python version. 3.11.8
The version of xinference you use. install from source on 2024年04月12日 17:40:02
Versions of crucial packages.
Full stack of the error.

2024-04-12 17:34:24,502 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
2024-04-12 17:34:24,504 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
2024-04-12 17:34:24,505 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.

Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

codingl2k1 · 2024-04-17T07:42:09Z

What's the first error?

LIKEGAKKI · 2024-06-11T09:06:05Z

我也遇到了，可以看这个issue，我的是cuda显存溢出导致的

chenchunhui97 · 2024-07-22T06:52:57Z

把 gpu-memory-utilization 设置的小一点试一下呢，我有一些类似的 error 就是显存 oom 的原因

github-actions · 2024-08-06T19:05:25Z

This issue is stale because it has been open for 7 days with no activity.

github-actions · 2024-08-12T03:37:49Z

This issue was closed because it has been inactive for 5 days since being marked as stale.

XprobeBot added this to the v0.10.2 milestone Apr 12, 2024

XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024

XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024

XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024

XprobeBot modified the milestones: v0.12.2, v0.12.4, v0.13.0, v0.13.1 Jun 28, 2024

XprobeBot modified the milestones: v0.13.1, v0.13.2 Jul 12, 2024

XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024

github-actions bot added the stale label Aug 6, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

大批量请求后服务直接卡死 #1291

大批量请求后服务直接卡死 #1291

yoke233 commented Apr 12, 2024 •

edited

Loading

codingl2k1 commented Apr 17, 2024

LIKEGAKKI commented Jun 11, 2024 •

edited

Loading

chenchunhui97 commented Jul 22, 2024

github-actions bot commented Aug 6, 2024

github-actions bot commented Aug 12, 2024

大批量请求后 服务直接卡死 #1291

大批量请求后 服务直接卡死 #1291

Comments

yoke233 commented Apr 12, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

codingl2k1 commented Apr 17, 2024

LIKEGAKKI commented Jun 11, 2024 • edited Loading

chenchunhui97 commented Jul 22, 2024

github-actions bot commented Aug 6, 2024

github-actions bot commented Aug 12, 2024

大批量请求后服务直接卡死 #1291

大批量请求后服务直接卡死 #1291

yoke233 commented Apr 12, 2024 •

edited

Loading

LIKEGAKKI commented Jun 11, 2024 •

edited

Loading