Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大批量请求后 服务直接卡死 #1291

Closed
yoke233 opened this issue Apr 12, 2024 · 5 comments
Closed

大批量请求后 服务直接卡死 #1291

yoke233 opened this issue Apr 12, 2024 · 5 comments
Labels
Milestone

Comments

@yoke233
Copy link

yoke233 commented Apr 12, 2024

Describe the bug

运行了两个模型 qwen1.5-32b-awq 和 qwen1.5-72b-gptq-int4
多线程并行请求 12线程 同时请求 72b , 会把服务器卡死, 任何请求都不接收

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version. 3.11.8
  2. The version of xinference you use. install from source on 2024年04月12日 17:40:02
  3. Versions of crucial packages.
  4. Full stack of the error.
2024-04-12 17:34:24,502 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
2024-04-12 17:34:24,504 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
2024-04-12 17:34:24,505 xinference.api.restful_api 105195 ERROR    [address=0.0.0.0:36143, pid=177461] Background loop has errored already.
Traceback (most recent call last):
  File "/data/inference/xinference/api/restful_api.py", line 1413, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
    ^^^^^^^^^^^^^^^^^
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 481, in async_chat
    c = await self.async_generate(full_prompt, generate_config)
    ^^^^^^^^^^^^^^^^^
  File "/data/inference/xinference/model/llm/vllm/core.py", line 393, in async_generate
    async for request_output in results_generator:
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 644, in generate
    raise e
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 628, in generate
    stream = await self.add_request(
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 516, in add_request
    self.start_background_loop()
    ^^^^^^^^^^^^^^^^^
  File "/data/miniconda3/envs/xin/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 391, in start_background_loop
    raise AsyncEngineDeadError(
    ^^^^^^^^^^^^^^^^^
vllm.engine.async_llm_engine.AsyncEngineDeadError: [address=0.0.0.0:36143, pid=177461] Background loop has errored already.

  1. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added this to the v0.10.2 milestone Apr 12, 2024
@codingl2k1
Copy link
Contributor

What's the first error?

@XprobeBot XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024
@LIKEGAKKI
Copy link
Contributor

LIKEGAKKI commented Jun 11, 2024

我也遇到了,可以看这个issue,我的是cuda显存溢出导致的

@XprobeBot XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.2, v0.12.4, v0.13.0, v0.13.1 Jun 28, 2024
@XprobeBot XprobeBot modified the milestones: v0.13.1, v0.13.2 Jul 12, 2024
@chenchunhui97
Copy link

把 gpu-memory-utilization 设置的小一点试一下呢,我有一些类似的 error 就是显存 oom 的原因

@XprobeBot XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024
Copy link

github-actions bot commented Aug 6, 2024

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Aug 6, 2024
Copy link

This issue was closed because it has been inactive for 5 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants