[0.9.1]DBO support EP parallel and optimize dual stream overlap #1589

shikang-hangzhou · 2025-07-02T10:05:47Z

What this PR does / why we need it?

DBO model support EP parallel
optimize dual stream overlap

max tokens:32784 input_len:1024 bs 32 dp2tp8ep16
before open dbo
TTFT: 4017ms

after open dbo
TTFT: 3017ms

Does this PR introduce any user-facing change?

None

How was this patch tested?

Signed-off-by: shikang-hangzhou <459956190@qq.com>

MengqingCao · 2025-07-03T06:50:27Z

vllm_ascend/models/deepseek_dbo.py

-            attn_cls = CustomDeepseekDBOMLAAttention
-        else:
-            attn_cls = DeepseekV2Attention
+        attn_cls = CustomDeepseekV2MLAAttention


why remove the branch when use_mla is False here?

dual stream overlap is a kind of optimized mode. deepseek-mha did not include in our application scenario, and have no improvements. So I think mha mode is useless.

MengqingCao · 2025-07-03T07:08:30Z

vllm_ascend/models/deepseek_dbo.py

-                hidden_states[i], router_logits[i], is_prefill, real_top_k,
-                enable_force_load_balance)
+
+            if global_num_experts == 256:


please add a comment that we use 256 here because the op npu_moe_gating_top_k only support this

please add a comment that we use 256 here because the op npu_moe_gating_top_k only support this

Thanks for your review, we have add comments

MengqingCao · 2025-07-03T07:12:36Z

vllm_ascend/models/deepseek_dbo.py

-                if self.dp_size > 1:
+                if (self.tp_size > 1
+                        and fused_moe_state != FusedMoEState.AllGather):
+                    dist.all_gather(list(chunk_hidden_states[i]),


I recomand to use tensor_model_parallel_all_gather directly

here align with the deepseekv2 code

Signed-off-by: shikang-hangzhou <459956190@qq.com>

ganyi1996ppo · 2025-07-03T12:49:32Z

vllm_ascend/models/deepseek_dbo.py

+                    MSEventKey.MOE_ALL_TO_ALL_FINISH],
+            )
+            context.before_comm_event.record()
+            with torch.npu.stream(ms_metadata.communicate_stream):


This kind of stream control method seems can't be captured in torchair, so this is just a eager mode dual batch impl right?

yes，so dual stream overlap only affect in prefill process.

ganyi1996ppo · 2025-07-03T12:53:37Z

vllm_ascend/models/deepseek_dbo.py

+
+        for i in range(num_micro_batchs):
+            ms_metadata.try_wait_event(layer_index, i,
+                                       MSEventKey.MOE_ALL_TO_ALL_FINISH)


What's the difference between event.wait and try_wait_event ?

its same here, I will modify it.

What's the difference between event.wait and try_wait_event ?

sorry, its diff between event.wait and try_wait_event. the last could assign num of microbatch which need wait

ganyi1996ppo · 2025-07-03T13:09:46Z

vllm_ascend/models/deepseek_dbo.py

+                    ep_group.world_size, -1).sum(-1)
+                scatter_sizes.append(scatter_size)
+                gather_sizes = torch.empty_like(scatter_sizes[i])
+                dist.all_to_all_single(gather_sizes,


I wonder if my understand is correct, you are trying to overlap the all_to_all with gating_topk right, since the second stream launch needs to wait for the end of gating

And you overlap the combine phase of all to all with the calc of the shared expert.

Signed-off-by: shikang-hangzhou <459956190@qq.com>

…overlap (vllm-project#1589) 1. DBO model support EP parallel 2. optimize dual stream overlap max tokens:32784 input_len:1024 bs 32 dp2tp8ep16 before open dbo TTFT: 4017ms ![before](https://github.com/user-attachments/assets/8f9e338d-978f-42cf-9add-825a8dd3418f) after open dbo TTFT: 3017ms ![after](https://github.com/user-attachments/assets/79f706fa-22c8-4c71-b5e3-ae3f53dac23b) None --------- Signed-off-by: shikang-hangzhou <459956190@qq.com>

…m-project#1420 vllm-project#1328 from v0.9.1-dev to main Signed-off-by: 22dimensions <waitingwind@foxmail.com>

shikang-hangzhou added 2 commits July 3, 2025 10:07

support ep

02b61ea

Signed-off-by: shikang-hangzhou <459956190@qq.com>

react

5ea5265

Signed-off-by: shikang-hangzhou <459956190@qq.com>

MengqingCao reviewed Jul 3, 2025

View reviewed changes

fix by comment

e4b1b12

Signed-off-by: shikang-hangzhou <459956190@qq.com>

ganyi1996ppo reviewed Jul 3, 2025

View reviewed changes

shikang-hangzhou added 2 commits July 3, 2025 21:59

fix by comments

dfa0590

Signed-off-by: shikang-hangzhou <459956190@qq.com>

fix by comments

85bd56b

Signed-off-by: shikang-hangzhou <459956190@qq.com>

ganyi1996ppo merged commit 4f007e8 into vllm-project:v0.9.1-dev Jul 4, 2025
16 checks passed

Yikun added the no-main label Jul 7, 2025

22dimensions mentioned this pull request Jul 22, 2025

[Misc] Add extra checking to torchair_graph_config. #1939

Merged

[0.9.1]DBO support EP parallel and optimize dual stream overlap #1589

[0.9.1]DBO support EP parallel and optimize dual stream overlap #1589

Uh oh!

Conversation

shikang-hangzhou commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shikang-hangzhou Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shikang-hangzhou commented Jul 2, 2025 •

edited

Loading

shikang-hangzhou Jul 4, 2025 •

edited

Loading