Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Oct 28, 2025

Purpose

VLLM_ALL2ALL_BACKEND=deepep_high_throughput vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --trust-remote-code --tensor-parallel-size 1 --data-parallel-size 4 --no-enable-prefix-caching --enable-expert-parallel --enable-dbo --enforce-eager

will trigger

^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/ep_kernels_workspace/DeepEP/deep_ep/buffer.py", line 393, in dispatch
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     return forward_call(*args, **kwargs)
    self.runtime.intranode_dispatch(x, x_scales, topk_idx, topk_weights,
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m RuntimeError: DeepEP error: CPU recv timeout
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1184, in forward
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     return self._finalize(
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m            ^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/modular_kernel.py", line 1070, in _finalize
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     finalize_ret = self.prepare_finalize.finalize_async(
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py", line 385, in finalize_async
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     receiver = self._finalize(
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m                ^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/model_executor/layers/fused_moe/deepep_ht_prepare_finalize.py", line 338, in _finalize
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     dbo_yield_and_switch_from_compute_to_comm()
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/ubatching.py", line 160, in wrapper
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     func(ctx, *args, **kwargs)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/ubatching.py", line 134, in yield_and_switch_from_compute_to_comm
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     self._wait_compute_done()
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/ubatching.py", line 84, in _wait_compute_done
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     self.comm_stream.wait_event(self.gpu_compute_done_event)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/.venv/lib/python3.12/site-packages/torch/cuda/streams.py", line 57, in wait_event
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     event.wait(self)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/.venv/lib/python3.12/site-packages/torch/cuda/streams.py", line 203, in wait
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     super().wait(stream)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m torch.AcceleratorError: CUDA error: an illegal memory access was encountered

...

^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_model_runner.py", line 3464, in _dummy_run
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     outputs = self.model(
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m               ^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_ubatch_wrapper.py", line 466, in __call__
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     return self._run_ubatches(ubatch_metadata, self.model)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m   File "/home/wentao/vllm-source/vllm/v1/worker/gpu_ubatch_wrapper.py", line 283, in _run_ubatches
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m     result = torch.cat(sorted_results, dim=0)
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^[[1;36m(EngineCore_DP0 pid=2344883)^[[0;0m RuntimeError: torch.cat(): expected a non-empty list of Tensors

This is because we didn't care the internal comm stream of deepEP, this PR fixes that

Test

VLLM_ALL2ALL_BACKEND=deepep_high_throughput vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 --trust-remote-code --tensor-parallel-size 1 --data-parallel-size 4 --no-enable-prefix-caching --enable-expert-parallel --enable-dbo --enforce-eager

(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /start_profile, Methods: POST
(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /stop_profile, Methods: POST
(APIServer pid=2481637) INFO 10-28 09:06:51 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=2481637) INFO:     Started server process [2481637]
(APIServer pid=2481637) INFO:     Waiting for application startup.
(APIServer pid=2481637) INFO:     Application startup complete.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@mergify mergify bot added the v1 label Oct 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a critical illegal memory access error when using DBO with DeepEP High Throughput kernels. The fix involves adding proper synchronization between vLLM's compute stream and DeepEP's internal streams by capturing and passing a CUDA event. The changes look correct and address the reported issue. However, I've identified a potential issue in the newly introduced utility function dbo_get_previous_event which could be a source of bugs in the future due to its implicit behavior. I've suggested a change to make it more explicit and robust.

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025
@tlrmchlsmth tlrmchlsmth merged commit b5d90f7 into main Oct 29, 2025
58 of 59 checks passed
@tlrmchlsmth tlrmchlsmth deleted the wentao-fix-dbo-IMA-issue branch October 29, 2025 20:28
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Nov 12, 2025
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants