[Kernels] Modular kernel refactor #24812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

mgoin merged 12 commits into vllm-project:main from neuralmagic:mk-refactor

Oct 8, 2025

Contributor

bnellnm commented Sep 13, 2025 •

edited by github-actions bot

Loading

Purpose

Simplify modular_kernel.py, mainly the forward method of FusedMoEModularKernel.

Fewer levels of indirection in forward
Simplify workspace_shape method by passing in M and current/chunked M sizes instead of tensors.
Separate getting the workspace type into a method that defaults to the activation dtype. Only a few experts can override this.
Factor out FusedMoEModularKernel prepare/finalize code into methods since the async/DBO features have made them more complicated.
Reduce number of special cases, esp. wrt chunking.
More robust chunking/reduce/do naive flags in layer.py

Test Plan

Run tests/kernels/moe.

Test Result

All tests pass.

cc @varun-sundar-rabindranath , @SageMoore , @LucasWilkinson

mergify bot added the gpt-oss label

yeqcharlotte added this to gpt-oss Issues & Enhancements

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the needs-rebase label

bnellnm force-pushed the mk-refactor branch from 392d2f8 to 2155672 Compare

September 18, 2025 15:57

mergify bot removed the needs-rebase label

mergify bot commented Sep 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added the needs-rebase label

bnellnm force-pushed the mk-refactor branch from 2155672 to 6d7e988 Compare

October 1, 2025 19:35

mergify bot removed the needs-rebase label

bnellnm marked this pull request as ready for review

October 2, 2025 00:07

bnellnm requested review from WoosukKwon, mgoin, tlrmchlsmth and yewentao256 as code owners

October 2, 2025 00:07

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Show resolved Hide resolved

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Outdated Show resolved Hide resolved

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/modular_kernel.py Show resolved Hide resolved

bnellnm requested a review from varun-sundar-rabindranath

October 3, 2025 18:51

bnellnm force-pushed the mk-refactor branch from 9f13e27 to 122591f Compare

October 3, 2025 21:15

bnellnm added 5 commits

October 6, 2025 15:25


          [Kernel] Simplify modular kernel forward method

d7814d1

Signed-off-by: Bill Nell <bnell@redhat.com>


          more refactoring + filter out invalid tests ahead of time

9f47abb

Signed-off-by: Bill Nell <bnell@redhat.com>


          format

0e217d2

Signed-off-by: Bill Nell <bnell@redhat.com>


          better reason strings

2fdf274

Signed-off-by: Bill Nell <bnell@redhat.com>


          fix merge conflict

fc183b5

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm force-pushed the mk-refactor branch from 1f2bde0 to fc183b5 Compare

October 6, 2025 15:38

bnellnm added 2 commits

October 6, 2025 17:02


          fix formatting nonsense

eaffd1d

Signed-off-by: Bill Nell <bnell@redhat.com>


          fix formatting nonsense and merge issues caused by formatting nonsense

ba3f4c3

Signed-off-by: Bill Nell <bnell@redhat.com>


          fix botched formatting

5f94c59

Signed-off-by: Bill Nell <bnell@redhat.com>

varun-sundar-rabindranath reviewed

View reviewed changes

tests/kernels/moe/modular_kernel_tools/common.py Show resolved Hide resolved

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

bnellnm added 2 commits

October 7, 2025 15:15


          add using_modular_kernel property

67a53d6

Signed-off-by: Bill Nell <bnell@redhat.com>


          clean up workspace_shapes

e95c437

Signed-off-by: Bill Nell <bnell@redhat.com>

varun-sundar-rabindranath reviewed

View reviewed changes

tests/kernels/moe/test_modular_kernel_combinations.py Show resolved Hide resolved

varun-sundar-rabindranath reviewed

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved


          move comment

d47b348

Signed-off-by: Bill Nell <bnell@redhat.com>

varun-sundar-rabindranath approved these changes

View reviewed changes

Contributor

varun-sundar-rabindranath left a comment

LGTM! Thanks for the much needed refactor 🙌

ProExpertProg added the ready label


          turn assert into skip

1129be7

Signed-off-by: Bill Nell <bnell@redhat.com>

mgoin approved these changes

View reviewed changes

Member

mgoin left a comment

Very nice!

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements

mgoin merged commit da36461 into vllm-project:main

53 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements

mgoin deleted the mk-refactor branch

October 8, 2025 21:51

mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

b48c504

Signed-off-by: Bill Nell <bnell@redhat.com>

mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

19c3b12

Signed-off-by: Bill Nell <bnell@redhat.com>

zhiyuan1i pushed a commit to zhiyuan1i/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

e2e0619

Signed-off-by: Bill Nell <bnell@redhat.com>

845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request


          Merge branch 'loader' of https://github.com/dsxsteven/vllm_splitPR in…

190900b

…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits)
  [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001)
  Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164)
  Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353)
  [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189)
  Bump Flashinfer to v0.4.0 (vllm-project#26326)
  Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464)
  [Core] Relax the LoRA  max rank (vllm-project#26461)
  [CI/Build] Fix model nightly tests (vllm-project#26466)
  [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486)
  [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926)
  [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200)
  [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439)
  [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462)
  [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445)
  [Minor] Change warning->warning_once in preprocess (vllm-project#26455)
  [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392)
  [Misc] Redact ray runtime env before logging (vllm-project#26302)
  Separate MLAAttention class from Attention (vllm-project#25103)
  [Attention] Register FLASHMLA_SPARSE (vllm-project#26441)
  [Kernels] Modular kernel refactor (vllm-project#24812)
  ...

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

7b904f7

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

9002f45

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

aafa4a2

Signed-off-by: Bill Nell <bnell@redhat.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

9c84e86

Signed-off-by: Bill Nell <bnell@redhat.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request


          [Kernels] Modular kernel refactor (vllm-project#24812)

35fb647

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

mgoin mgoin approved these changes

tlrmchlsmth Awaiting requested review from tlrmchlsmth tlrmchlsmth is a code owner

WoosukKwon Awaiting requested review from WoosukKwon WoosukKwon is a code owner

yewentao256 Awaiting requested review from yewentao256 yewentao256 is a code owner

+1 more reviewer

varun-sundar-rabindranath varun-sundar-rabindranath approved these changes

Labels