Support sequence parallel MOE after upstream #24982 #285

wuxun-zhang · 2025-09-28T09:09:16Z

After vllm-project/vllm#24982 merged, sequence parallel MOE will be turned on when enable_expert_parallel=True, tp_size > 1 and dp_size > 1. Since for Gaudi, there is no choice for VLLM_ALL2ALL_BACKEND, we can not easily bypass it. So this PR aims to support the feature.

class ParallelConfig:

  @property
    def use_sequence_parallel_moe(self) -> bool:
        return (envs.VLLM_ALL2ALL_BACKEND
                in ("allgather_reducescatter", "naive",
                    "deepep_high_throughput", "deepep_low_latency")
                and self.enable_expert_parallel
                and self.tensor_parallel_size > 1
                and self.data_parallel_size > 1)

Update:
No hard requirement on vllm-project/vllm#25828

Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>

github-actions · 2025-09-28T10:02:56Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
c242c98031b87d00999e07dbb4aa9b2a70798c6c

xuechendi · 2025-09-29T22:10:52Z

@wuxun-zhang , we are trying to make bs,seq_len, hidden_state not hard-requirement for HPU.
Please check with this new flag: https://github.com/vllm-project/vllm-gaudi/blob/main/vllm_gaudi/extension/features.py#L89

Please try with add GraniteMOE to flatten input_ids to 1D.

Meanwwhile, I also discussed with @kzawora-intel , since more and more models asserts on 2D input, we might change to 1D as default once performance validated.

Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>

wuxun-zhang · 2025-09-30T00:18:18Z

Please try with add GraniteMOE to flatten input_ids to 1D.

Thanks, it works. Just updated.

github-actions · 2025-09-30T01:16:40Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
c242c98031b87d00999e07dbb4aa9b2a70798c6c

After vllm-project/vllm#24982 merged, sequence parallel MOE will be turned on when `enable_expert_parallel=True`, `tp_size > 1` and `dp_size > 1`. Since for Gaudi, there is no choice for `VLLM_ALL2ALL_BACKEND`, we can not easily bypass it. So this PR aims to support the feature. ```python class ParallelConfig: @Property def use_sequence_parallel_moe(self) -> bool: return (envs.VLLM_ALL2ALL_BACKEND in ("allgather_reducescatter", "naive", "deepep_high_throughput", "deepep_low_latency") and self.enable_expert_parallel and self.tensor_parallel_size > 1 and self.data_parallel_size > 1) ``` Update: No hard requirement on vllm-project/vllm#25828 --------- Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com> Signed-off-by: Iryna Boiko <iboiko@habana.ai>

wuxun-zhang added 2 commits September 28, 2025 08:40

Fix dispatch/combine interface change after upstream PR#24982

3311be5

Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>

support sp moe

a1b20d8

Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>

wuxun-zhang requested review from adobrzyn, afierka-intel, kzawora-intel, mgawarkiewicz-intel, mswiniarsk, vivekgoe and xuechendi as code owners September 28, 2025 09:09

wuxun-zhang mentioned this pull request Sep 28, 2025

Fix DP dummy run cfg #284

Merged

wuxun-zhang added 2 commits September 30, 2025 03:15

enable flatten input for granitemoe model

b22a3f3

Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>

Merge remote-tracking branch 'origin/main' into wuxun/fix-dp-sp-moe

4caed23

xuechendi approved these changes Sep 30, 2025

View reviewed changes

xuechendi enabled auto-merge (squash) September 30, 2025 01:07

xuechendi merged commit 922a18f into vllm-project:main Sep 30, 2025
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support sequence parallel MOE after upstream #24982 #285

Support sequence parallel MOE after upstream #24982 #285

Uh oh!

wuxun-zhang commented Sep 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 28, 2025

Uh oh!

xuechendi commented Sep 29, 2025

Uh oh!

wuxun-zhang commented Sep 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support sequence parallel MOE after upstream #24982 #285

Support sequence parallel MOE after upstream #24982 #285

Uh oh!

Conversation

wuxun-zhang commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 28, 2025

✅ CI Passed

Uh oh!

xuechendi commented Sep 29, 2025

Uh oh!

wuxun-zhang commented Sep 30, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 30, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wuxun-zhang commented Sep 28, 2025 •

edited

Loading