make DeepGEMM swapAB available for linear gemm SM90 #2101

xuanzic · 2025-11-17T23:35:49Z

📌 Description

In flashinfer we already had fp8_gemm_kernel_swapAB kernel for optimizing Mixture of Experts (MOE) GEMM and Dense GEMM operations reference 1, reference 2, and reference 3.
This kernel improves performance in small batch scenarios by swapping the input order in matrix multiplication.

These kernels are currently used for:

MoE operations (exposed via fused_moe module)
Available in the codebase for Dense GEMM but not exposed for linear/dense layers

This PR aims to

Add Python binding to expose linear operations
- Create dedicated binding for fp8_blockscale_gemm in csrc/fp8_blockscale_gemm_sm90_binding.cu
- Add JIT module generation in flashinfer/jit/gemm/
- Expose API in flashinfer/gemm/gemm_base.py
- Add extensive test cases in flashinfer/tests/gemm/test_fp8_blockscale_gemm.py

TODO

Benchmark with real model and compare performance in vLLM comparing to Cutlass GEMM

🔍 Related Issues

vLLM 28427
vLLM 28316

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

coderabbitai · 2025-11-17T23:35:55Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining why this PR is needed, why this solution was chosen, and what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

xuanzic and others added 4 commits November 14, 2025 20:33

add swapab linear gemm binding

3871b93

fix binding

1eeba0f

Merge branch 'flashinfer-ai:main' into vchen/dg_swapab_linear

e2cee34

rename function for SM90

364ee70

jhaotingc mentioned this pull request Nov 18, 2025

[Feature][Kernel]: DeepSeek-R1 KV Proj is Too Slow for TP vllm-project/vllm#28427

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

make DeepGEMM swapAB available for linear gemm SM90 #2101

make DeepGEMM swapAB available for linear gemm SM90 #2101

xuanzic commented Nov 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 17, 2025

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

make DeepGEMM swapAB available for linear gemm SM90 #2101

Are you sure you want to change the base?

make DeepGEMM swapAB available for linear gemm SM90 #2101

Conversation

xuanzic commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

coderabbitai bot commented Nov 17, 2025

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xuanzic commented Nov 17, 2025 •

edited

Loading