Skip to content

Conversation

@bnellnm
Copy link
Contributor

@bnellnm bnellnm commented Oct 2, 2025

Purpose

  • Use SharedFusedMoE in all models that use shared experts. This will enable the shared experts/communication overlap optimization for all the changed models.
  • Move SharedFusedMoE class to fused_moe directory.
  • Update SharedFusedMoE to behave like FusedMoE when the shared_experts are None
  • Disable shared expert overlap if EP is disabled or we are not using flashinfer + DP since there is nothing to be gained in this case and it prevents the shared experts from being hidden from torch.compile.

For most models the changes consist of renaming FusedMoE -> SharedFusedMoE and passing the shared experts module as a parameter to SharedFusedMoE. A few models required extra tweaks: aria, ernie45_vl_moe and the qwen models.

Test Plan

Test all modified models.
Note: all model types appear to be covered by tests/models/registry.py

Test Result

TBD

@mergify mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models llama Related to Llama models labels Oct 2, 2025
@bnellnm bnellnm marked this pull request as ready for review October 3, 2025 03:15
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@bnellnm bnellnm changed the title [Model] Use SharedFusedMoE in all models with shared experts [Model] Apply shared overlap optimization to all models with shared experts Oct 7, 2025
@bnellnm bnellnm changed the title [Model] Apply shared overlap optimization to all models with shared experts [Model] Apply shared experts overlap optimization to all models with shared experts Oct 7, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice refactor to get thank you

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
@mgoin mgoin merged commit 47e66c2 into vllm-project:main Oct 9, 2025
62 checks passed
yang926 pushed a commit to yang926/vllm_1008 that referenced this pull request Oct 9, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: yang926 <yang926@naver.com>
yang926 pushed a commit to yang926/vllm_1008 that referenced this pull request Oct 9, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: yang926 <yang926@naver.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Oct 24, 2025
### What this PR does / why we need it?
This is the step 1 of refactoring code to adapt with vllm main, and this
pr aligned with
vllm-project/vllm@17c540a

1. refactor deepseek to the latest code arch as of
vllm-project/vllm@17c540a
 
2. bunches of fixes due to vllm changes
- Fix `AscendScheduler` `__post_init__`, caused by
vllm-project/vllm#25075
- Fix `AscendScheduler` init got an unexpected arg `block_size`, caused
by vllm-project/vllm#26296
- Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by
vllm-project/vllm#23485
- Fix `MLAAttention` import,caused by
vllm-project/vllm#25103
- Fix `SharedFusedMoE` import, caused by
vllm-project/vllm#26145
- Fix `LazyLoader` improt, caused by
vllm-project/vllm#27022
- Fix `vllm.utils.swap_dict_values` improt, caused by
vllm-project/vllm#26990
- Fix `Backend` enum import, caused by
vllm-project/vllm#25893
- Fix `CompilationLevel` renaming to `CompilationMode` issue introduced
by vllm-project/vllm#26355
- Fix fused_moe ops, caused by
vllm-project/vllm#24097
- Fix bert model because of `inputs_embeds`, caused by
vllm-project/vllm#25922
- Fix MRope because of `get_input_positions_tensor` to
`get_mrope_input_positions`, caused by
vllm-project/vllm#24172
- Fix `splitting_ops` changes introduced by
vllm-project/vllm#25845
- Fix multi-modality changes introduced by
vllm-project/vllm#16229
- Fix lora bias dropping issue introduced by
vllm-project/vllm#25807
- Fix structured ouput break introduced by
vllm-project/vllm#26737

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
CI passed with existing test.


- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Icey <1790571317@qq.com>
Co-authored-by: Icey <1790571317@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…shared experts (vllm-project#26145)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants