[Feature] Implement EP-compatible fused_moe #121

yiz-liu · 2025-02-20T09:26:50Z

What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

Does this PR introduce any user-facing change?

Enable EP
add enable_expert_parallel=True in your offline inference scripts, like this:

llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)

How was this patch tested?

Please use the main branch of vLLM.

vllm_ascend/ops/fused_moe.py

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

Co-authored-by: Yaphets24 d_mym0618@163.com Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

### What this PR does / why we need it? Enable Expert-Parallel for ascend devices. ### Does this PR introduce _any_ user-facing change? Enable EP add `enable_expert_parallel=True` in your offline inference scripts, like this: ```python llm = LLM( model="/path/to/model", trust_remote_code=True, tensor_parallel_size=4, max_model_len=4096, enforce_eager=True, distributed_executor_backend="mp", enable_expert_parallel=True, ) ``` ### How was this patch tested? Please use the `main` branch of vLLM. --------- Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com> Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com> Signed-off-by: angazenn <zengyanjia@huawei.com>

### What this PR does / why we need it? Enable Expert-Parallel for ascend devices. ### Does this PR introduce _any_ user-facing change? Enable EP add `enable_expert_parallel=True` in your offline inference scripts, like this: ```python llm = LLM( model="/path/to/model", trust_remote_code=True, tensor_parallel_size=4, max_model_len=4096, enforce_eager=True, distributed_executor_backend="mp", enable_expert_parallel=True, ) ``` ### How was this patch tested? Please use the `main` branch of vLLM. --------- Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com> Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>

yiz-liu changed the title ~~[Feature] Implement native fused MoE layer~~ [WIP][Feature] Implement native fused MoE layer Feb 20, 2025

wangxiyuan closed this Feb 21, 2025

wangxiyuan reopened this Feb 22, 2025

yiz-liu force-pushed the main branch from cbd8726 to 147c19f Compare February 22, 2025 06:16

github-actions bot added module:tests module:ops labels Mar 4, 2025

yiz-liu force-pushed the main branch from 22b9c36 to 9bd073c Compare March 7, 2025 07:46

yiz-liu changed the title ~~[WIP][Feature] Implement native fused MoE layer~~ [WIP][Feature] Implement EP-compatible fused_moe Mar 7, 2025

yiz-liu force-pushed the main branch 3 times, most recently from 67429eb to b8d514d Compare March 8, 2025 07:41

wuhuikx approved these changes Mar 8, 2025

View reviewed changes

ganyi1996ppo reviewed Mar 9, 2025

View reviewed changes

vllm_ascend/ops/fused_moe.py Show resolved Hide resolved

ganyi1996ppo approved these changes Mar 10, 2025

View reviewed changes

yiz-liu force-pushed the main branch from b8d514d to fcdac2e Compare March 11, 2025 02:09

Yizhou Liu added 7 commits March 11, 2025 10:10

[Feature] Implement native fused MoE layer

c13df27

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Test] Add a primal test for fused_moe

bf63503

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Feature] Add support for grouped_topk and grouped_matmul

89aa1cc

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Feat] Improve fused_moe with fixed shape, support torch.compile

f7781bc

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Feature] Add native grouped top-k function and enhance fused MoE tests

b1bf935

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Test] Fix test case and format

fcdac2e

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

[Refactor]: Refactor fused_moe.py and add docstrings

7669497

Co-authored-by: Yaphets24 d_mym0618@163.com Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

yiz-liu force-pushed the main branch 2 times, most recently from c6117aa to 5768e51 Compare March 11, 2025 08:36

[Test]: Fix test cases for fused_moe.py

5768e51

Co-authored-by: Yaphets24 d_mym0618@163.com Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>

yiz-liu changed the title ~~[WIP][Feature] Implement EP-compatible fused_moe~~ [Feature] Implement EP-compatible fused_moe Mar 11, 2025

wangxiyuan approved these changes Mar 11, 2025

View reviewed changes

wangxiyuan merged commit 0db6670 into vllm-project:main Mar 11, 2025
12 checks passed

Yikun mentioned this pull request Jun 28, 2025

vLLM Ascend Roadmap Q2 2025 #448

Closed

40 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Implement EP-compatible fused_moe #121

[Feature] Implement EP-compatible fused_moe #121

yiz-liu commented Feb 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feature] Implement EP-compatible fused_moe #121

[Feature] Implement EP-compatible fused_moe #121

Conversation

yiz-liu commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiz-liu commented Feb 20, 2025 •

edited

Loading