Skip to content

Conversation

@yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented Feb 20, 2025

What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

Does this PR introduce any user-facing change?

Enable EP
add enable_expert_parallel=True in your offline inference scripts, like this:

llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)

How was this patch tested?

Please use the main branch of vLLM.

@yiz-liu yiz-liu changed the title [Feature] Implement native fused MoE layer [WIP][Feature] Implement native fused MoE layer Feb 20, 2025
@wangxiyuan wangxiyuan closed this Feb 21, 2025
@wangxiyuan wangxiyuan reopened this Feb 22, 2025
@yiz-liu yiz-liu changed the title [WIP][Feature] Implement native fused MoE layer [WIP][Feature] Implement EP-compatible fused_moe Mar 7, 2025
@yiz-liu yiz-liu force-pushed the main branch 3 times, most recently from 67429eb to b8d514d Compare March 8, 2025 07:41
Yizhou Liu added 7 commits March 11, 2025 10:10
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yaphets24 d_mym0618@163.com
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
@yiz-liu yiz-liu force-pushed the main branch 2 times, most recently from c6117aa to 5768e51 Compare March 11, 2025 08:36
Co-authored-by: Yaphets24 d_mym0618@163.com
Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
@yiz-liu yiz-liu changed the title [WIP][Feature] Implement EP-compatible fused_moe [Feature] Implement EP-compatible fused_moe Mar 11, 2025
@wangxiyuan wangxiyuan merged commit 0db6670 into vllm-project:main Mar 11, 2025
12 checks passed
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Mar 18, 2025
### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
Signed-off-by: angazenn <zengyanjia@huawei.com>
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
@Yikun Yikun mentioned this pull request Jun 28, 2025
40 tasks
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?

Enable Expert-Parallel for ascend devices.

### Does this PR introduce _any_ user-facing change?

Enable EP
add `enable_expert_parallel=True` in your offline inference scripts,
like this:
```python
llm = LLM(
    model="/path/to/model",
    trust_remote_code=True,
    tensor_parallel_size=4,
    max_model_len=4096,
    enforce_eager=True,
    distributed_executor_backend="mp",
    enable_expert_parallel=True,
)
```

### How was this patch tested?

Please use the `main` branch of vLLM.

---------

Signed-off-by: Yizhou Liu <liuyizhou5@h-partners.com>
Co-authored-by: Yizhou Liu <liuyizhou5@h-partners.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants