Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Oct 21, 2025

Purpose

Fixes #27254

Test

Origin:

vllm serve RedHatAI/DeepSeek-V2.5-1210-FP8 --enable-expert-parallel -tp 4 --enforce_eager
......

(EngineCore_DP0 pid=89543) Traceback (most recent call last):
(EngineCore_DP0 pid=89543)   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=89543)     self.run()
(EngineCore_DP0 pid=89543)   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=89543)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=89543)     raise e
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=89543)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=89543)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=89543)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=89543)     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=89543)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=89543)     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=89543)     raise RuntimeError(
(EngineCore_DP0 pid=89543) RuntimeError: Worker failed with error '', please check the stack trace above for the root cause
(APIServer pid=88560) Traceback (most recent call last):
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=88560)     sys.exit(main())
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=88560)     args.dispatch_function(args)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=88560)     uvloop.run(run_server(args))
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=88560)     return loop.run_until_complete(wrapper())
(APIServer pid=88560)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=88560)     return await main
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=88560)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=88560)     async with build_async_engine_client(
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=88560)     return await anext(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=88560)     async with build_async_engine_client_from_engine_args(
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=88560)     return await anext(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=88560)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=88560)     return fn(*args, **kwargs)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=88560)     return cls(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=88560)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=88560)     return AsyncMPClient(*client_args)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=88560)     super().__init__(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=88560)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=88560)     next(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=88560)     wait_for_engine_startup(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=88560)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=88560) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Now:

(APIServer pid=655331) INFO:     Started server process [655331]
(APIServer pid=655331) INFO:     Waiting for application startup.
(APIServer pid=655331) INFO:     Application startup complete.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@mergify mergify bot added the deepseek Related to DeepSeek models label Oct 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue with the DeepSeek-V2.5-1210-FP8 model in vllm, specifically related to the configuration of quantization parameters in the cutlass_moe_fp8 function. The changes involve adding a check to ensure that the dimensions of w1_scale are compatible with w1_q when per-out-channel quantization is enabled, and adding another check when it is disabled.

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025
@mgoin mgoin merged commit 1c16084 into main Oct 22, 2025
57 of 59 checks passed
@mgoin mgoin deleted the wentao-fix-DeepSeek-V2.5-FP8 branch October 22, 2025 15:00
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Can't deploy model: RedHatAI/DeepSeek-V2.5-1210-FP8

3 participants