KeyError: 'reasoning_content' in qwen2.5-vl-instruct #2927

ignore1999 · 2025-02-24T08:10:41Z

bug

2025-02-24 00:02:45,915 xinference.core.supervisor 53 DEBUG    [request bb535f1e-f285-11ef-9baf-0242ac11000c] Enter terminate_model, args: <xinference.core.supervisor.SupervisorActor object at 0x7fc8de22fe70>,qwen2.5-vl-instruct, kwargs: suppress_exception=True
2025-02-24 00:02:45,915 xinference.core.supervisor 53 DEBUG    [request bb535f1e-f285-11ef-9baf-0242ac11000c] Leave terminate_model, elapsed time: 0 s
2025-02-24 00:02:45,917 xinference.api.restful_api 1 ERROR    [address=0.0.0.0:35237, pid=288] 'reasoning_content'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1002, in launch_model
    model_uid = await (await self._get_supervisor_ref()).launch_builtin_model(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1190, in launch_builtin_model
    await _launch_model()
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1125, in _launch_model
    subpool_address = await _launch_one_model(
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/supervisor.py", line 1083, in _launch_one_model
    subpool_address = await worker_ref.launch_builtin_model(
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/utils.py", line 93, in wrapped
    ret = await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/worker.py", line 926, in launch_builtin_model
    await model_ref.load()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 231, in send
    return self._process_result_message(result)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 667, in send
    result = await self._run_coro(message.message_id, coro)
  File "/usr/local/lib/python3.10/dist-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/usr/local/lib/python3.10/dist-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/usr/local/lib/python3.10/dist-packages/xinference/core/model.py", line 464, in load
    self._model.load()
  File "/usr/local/lib/python3.10/dist-packages/xinference/model/llm/vllm/core.py", line 275, in load
    reasoning_content = self._model_config.pop("reasoning_content")
KeyError: [address=0.0.0.0:35237, pid=288] 'reasoning_content'

System Info / 系統信息

docker images：(Feb 24, 2025)
registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:latest

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

February 24 latest

The command used to start Xinference / 用以启动 xinference 的命令

xinference launch --model-name qwen2.5-vl-instruct --model-type LLM --model-engine vLLM --model-format pytorch --size-in-billions 3 --quantization none --n-gpu auto --replica 1 --n-worker 1

Reproduction / 复现过程

docker pull registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:latest

docker run \
  -v ./.xinference:/root/.xinference \
  -v ./.cache/huggingface:/root/.cache/huggingface \
  -v ./.cache/modelscope:/root/.cache/modelscope \
  -e XINFERENCE_MODEL_SRC=modelscope \
  -p 9998:9997 \
  --gpus all \
  --name xinference_0224 \
  registry.cn-hangzhou.aliyuncs.com/xprobe_xinference/xinference:latest \
  xinference-local -H 0.0.0.0 \
  --log-level debug

xinference launch --model-name qwen2.5-vl-instruct --model-type LLM --model-engine vLLM --model-format pytorch --size-in-billions 3 --quantization none --n-gpu auto --replica 1 --n-worker 1

Expected behavior / 期待表现

fixed bug

The text was updated successfully, but these errors were encountered:

harryzwh · 2025-02-24T10:26:12Z

Same here. The following cmdline works in v1.3.0.post1, but fails to launch after updating to v1.3.0.post2.
xinference launch -t LLM -n qwen2-vl-instruct -s 72 -f awq -q Int4 -en vllm
v1.3.0.post2 do fix the bug in #2914 but introduce the same "reasoning_content" error to other model.
@amumu96

amumu96 · 2025-02-25T07:48:02Z

fix at #2944

XprobeBot added the gpu label Feb 24, 2025

XprobeBot added this to the v1.x milestone Feb 24, 2025

amumu96 mentioned this issue Feb 25, 2025

BUG: fix qwen2.5-vl-7b cannot chat bug #2944

Merged

qinxuye closed this as completed in #2944 Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'reasoning_content' in qwen2.5-vl-instruct #2927

KeyError: 'reasoning_content' in qwen2.5-vl-instruct #2927

ignore1999 commented Feb 24, 2025 •

edited

Loading

harryzwh commented Feb 24, 2025

amumu96 commented Feb 25, 2025

KeyError: 'reasoning_content' in qwen2.5-vl-instruct #2927

KeyError: 'reasoning_content' in qwen2.5-vl-instruct #2927

Comments

ignore1999 commented Feb 24, 2025 • edited Loading

bug

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

harryzwh commented Feb 24, 2025

amumu96 commented Feb 25, 2025

ignore1999 commented Feb 24, 2025 •

edited

Loading