Add CUTLASS fused moe kernels from TensorRT-LLM. #1113

wenscarl · 2025-06-04T00:00:32Z

📌 This PR added the CUTLASS implementation of fused Mixture of Expert from TensorRT-LLM.

🔍 Related Issues

Supported data types are: fp32/bf16/fp16/float8_e4m3fn/float8_e2m1
The kernels also support expert/tensor parallellism.

This PR also exposes quantization methods for nvfp4.

🧪 Tests

tests/test_trtllm_cutlass_fused.py
tests/test_fp4_quantize.py

Reviewer Notes

yzh119

Unittests passed on my side, @wenscarl thanks for the huge effort, let's merge this in first!

Currently all of the c++/cude source code are within to csrc directory, where we store pytorch c++ interface and pybind code, we should move kernel definition and framework agnostic interface to include directory as part of the header only library and reuse infrastructure with other components, in future PRs.

…-ai#1113)  ## 📌 This PR added the CUTLASS implementation of fused Mixture of Expert from TensorRT-LLM.  ## 🔍 Related Issues Supported data types are: fp32/bf16/fp16/float8_e4m3fn/float8_e2m1 The kernels also support expert/tensor parallellism. This PR also exposes quantization methods for nvfp4.  ## 🧪 Tests - [ ] tests/test_trtllm_cutlass_fused.py - [ ] tests/test_fp4_quantize.py ## Reviewer Notes

zkyue · 2025-06-27T10:14:36Z

is this only support sm100 ？

…-ai#1113)  ## 📌 This PR added the CUTLASS implementation of fused Mixture of Expert from TensorRT-LLM.  ## 🔍 Related Issues Supported data types are: fp32/bf16/fp16/float8_e4m3fn/float8_e2m1 The kernels also support expert/tensor parallellism. This PR also exposes quantization methods for nvfp4.  ## 🧪 Tests - [ ] tests/test_trtllm_cutlass_fused.py - [ ] tests/test_fp4_quantize.py ## Reviewer Notes

fzyzcjy · 2025-07-24T00:30:10Z

Supported data types are: fp32/bf16/fp16/float8_e4m3fn/float8_e2m1

Hi, it seems nvfp4 is in the UT as well as in source code, thus I would appreciate it if I could know whether this supports nvfp4 as well

wenscarl force-pushed the trtllm_cutlass_fused_moe branch from bddbbef to 9957425 Compare June 4, 2025 03:01

wenscarl marked this pull request as ready for review June 4, 2025 03:02

Add CUTLASS fused moe kernels from TensorRT-LLM.

8d551aa

wenscarl force-pushed the trtllm_cutlass_fused_moe branch from 9957425 to 8d551aa Compare June 4, 2025 03:07

wenscarl requested a review from yzh119 June 4, 2025 03:13

yzh119 approved these changes Jun 4, 2025

View reviewed changes

yzh119 merged commit d7e070f into flashinfer-ai:main Jun 4, 2025
2 checks passed

pavanimajety mentioned this pull request Jun 9, 2025

[RFC]: Blackwell Enablement for vLLM (SM100) vllm-project/vllm#18153

Open

21 tasks

trevor-m mentioned this pull request Jun 18, 2025

FlashInfer NVFP4 MoE with EP & 2-stream shared expert sgl-project/sglang#7327

Merged

6 tasks

wenscarl mentioned this pull request Jun 24, 2025

[Core] FlashInfer CUTLASS fused MoE backend (NVFP4) vllm-project/vllm#20037

Merged

yzh119 mentioned this pull request Jun 24, 2025

Fix missing symbols in trtllm_utils.so #1168

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CUTLASS fused moe kernels from TensorRT-LLM. #1113

Add CUTLASS fused moe kernels from TensorRT-LLM. #1113

Uh oh!

wenscarl commented Jun 4, 2025 •

edited

Loading

Uh oh!

yzh119 left a comment

Uh oh!

Uh oh!

zkyue commented Jun 27, 2025

Uh oh!

fzyzcjy commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add CUTLASS fused moe kernels from TensorRT-LLM. #1113

Add CUTLASS fused moe kernels from TensorRT-LLM. #1113

Uh oh!

Conversation

wenscarl commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 This PR added the CUTLASS implementation of fused Mixture of Expert from TensorRT-LLM.

🔍 Related Issues

🧪 Tests

Reviewer Notes

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zkyue commented Jun 27, 2025

Uh oh!

fzyzcjy commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wenscarl commented Jun 4, 2025 •

edited

Loading