Skip to content

Conversation

@bnellnm
Copy link
Contributor

@bnellnm bnellnm commented Sep 13, 2025

Purpose

Simplify modular_kernel.py, mainly the forward method of FusedMoEModularKernel.

  • Fewer levels of indirection in forward
  • Simplify workspace_shape method by passing in M and current/chunked M sizes instead of tensors.
  • Separate getting the workspace type into a method that defaults to the activation dtype. Only a few experts can override this.
  • Factor out FusedMoEModularKernel prepare/finalize code into methods since the async/DBO features have made them more complicated.
  • Reduce number of special cases, esp. wrt chunking.
  • More robust chunking/reduce/do naive flags in layer.py

Test Plan

Run tests/kernels/moe.

Test Result

All tests pass.

cc @varun-sundar-rabindranath , @SageMoore , @LucasWilkinson

@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Sep 13, 2025
@mergify
Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify
Copy link

mergify bot commented Sep 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 20, 2025
@mergify mergify bot removed the needs-rebase label Oct 1, 2025
@bnellnm bnellnm marked this pull request as ready for review October 2, 2025 00:07
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the much needed refactor 🙌

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 7, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Oct 8, 2025
@mgoin mgoin merged commit da36461 into vllm-project:main Oct 8, 2025
53 checks passed
@mgoin mgoin deleted the mk-refactor branch October 8, 2025 21:51
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
mrasquinha-g pushed a commit to mrasquinha-g/vllm that referenced this pull request Oct 9, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
zhiyuan1i pushed a commit to zhiyuan1i/vllm that referenced this pull request Oct 9, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
845473182 pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Oct 10, 2025
…to loader

* 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits)
  [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001)
  Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164)
  Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353)
  [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189)
  Bump Flashinfer to v0.4.0 (vllm-project#26326)
  Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464)
  [Core] Relax the LoRA  max rank (vllm-project#26461)
  [CI/Build] Fix model nightly tests (vllm-project#26466)
  [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486)
  [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926)
  [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200)
  [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439)
  [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462)
  [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445)
  [Minor] Change warning->warning_once in preprocess (vllm-project#26455)
  [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392)
  [Misc] Redact ray runtime env before logging (vllm-project#26302)
  Separate MLAAttention class from Attention (vllm-project#25103)
  [Attention] Register FLASHMLA_SPARSE (vllm-project#26441)
  [Kernels] Modular kernel refactor (vllm-project#24812)
  ...
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants