[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267

yewentao256 · 2025-10-21T15:11:44Z

Purpose

Test

Origin:

vllm serve RedHatAI/DeepSeek-V2.5-1210-FP8 --enable-expert-parallel -tp 4 --enforce_eager
......

(EngineCore_DP0 pid=89543) Traceback (most recent call last):
(EngineCore_DP0 pid=89543)   File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=89543)     self.run()
(EngineCore_DP0 pid=89543)   File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=89543)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=89543)     raise e
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=89543)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=89543)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=89543)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=89543)     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=89543)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=89543)     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=89543)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=89543)     raise RuntimeError(
(EngineCore_DP0 pid=89543) RuntimeError: Worker failed with error '', please check the stack trace above for the root cause
(APIServer pid=88560) Traceback (most recent call last):
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=88560)     sys.exit(main())
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=88560)     args.dispatch_function(args)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=88560)     uvloop.run(run_server(args))
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=88560)     return loop.run_until_complete(wrapper())
(APIServer pid=88560)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=88560)     return await main
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=88560)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=88560)     async with build_async_engine_client(
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=88560)     return await anext(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=88560)     async with build_async_engine_client_from_engine_args(
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=88560)     return await anext(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=88560)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=88560)     return fn(*args, **kwargs)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=88560)     return cls(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=88560)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=88560)     return AsyncMPClient(*client_args)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=88560)     super().__init__(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=88560)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=88560)   File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=88560)     next(self.gen)
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=88560)     wait_for_engine_startup(
(APIServer pid=88560)   File "/proj-tango-pvc/users/zhipeng.wang/workspace/vllm/.venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=88560)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=88560) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Now:

(APIServer pid=655331) INFO:     Started server process [655331]
(APIServer pid=655331) INFO:     Waiting for application startup.
(APIServer pid=655331) INFO:     Application startup complete.

Signed-off-by: yewentao256 <zhyanwentao@126.com>

gemini-code-assist

Code Review

This pull request addresses an issue with the DeepSeek-V2.5-1210-FP8 model in vllm, specifically related to the configuration of quantization parameters in the cutlass_moe_fp8 function. The changes involve adding a check to ensure that the dimensions of w1_scale are compatible with w1_q when per-out-channel quantization is enabled, and adding another check when it is disabled.

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: yewentao256 <zhyanwentao@126.com>

fix DeepSeek-V2.5-1210-FP8 issue

5c86b36

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 requested review from mgoin and pavanimajety as code owners October 21, 2025 15:11

mergify bot added the deepseek Related to DeepSeek models label Oct 21, 2025

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025

mgoin approved these changes Oct 22, 2025

View reviewed changes

mgoin merged commit 1c16084 into main Oct 22, 2025
57 of 59 checks passed

mgoin deleted the wentao-fix-DeepSeek-V2.5-FP8 branch October 22, 2025 15:00

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

dd197d0

Signed-off-by: yewentao256 <zhyanwentao@126.com>

albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

b409d66

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

a864af2

Signed-off-by: yewentao256 <zhyanwentao@126.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

2aabbcf

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

97f9d7d

Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

28ddec8

Signed-off-by: yewentao256 <zhyanwentao@126.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (vllm-project#27267)

98172c2

Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267

Uh oh!

yewentao256 commented Oct 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267

[Bug] Fix DeepSeek-V2.5-1210-FP8 issue #27267

Uh oh!

Conversation

yewentao256 commented Oct 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yewentao256 commented Oct 21, 2025 •

edited by github-actions bot

Loading