[TRITON] Add MoE GEMM a4w4 kernel by nsusanto · Pull Request #1358 · ROCm/aiter

nsusanto · 2025-11-06T22:05:28Z

Motivation

This PR adds a new kernel for mxfp4 x mxfp4 mxfp4GEMM. Both weights and attention must use mx4 scaling, static fp4 scaling is unimplemented since no models are using it. This kernel config is tuned for DeepSeek R1-0528 FP4 shapes.

Test Plan

Test cases are implemented in aiter/op_test/triton_tests/test_moe_gemm_a4w8.py

Test Result

python bench_moe_gemm_a4w4.py --shape 7168 4096 --experts 128 4 --op-regex .swiglu.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

azaidy

LGTM!

* Implement a4w4 moe kernel * tune testcase for a4w4 based on deepseek r1 shapes * refactor activation quant to use deepseek fp4 quant * skip a4w4 unit tests on MI300 * Add layer1/layer2 suffix for easier profiling * Add --num-weight-inits flag to average MoE benchmark results ---------

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch 2 times, most recently from c554aca to a902874 Compare November 12, 2025 20:08

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch 4 times, most recently from 2e7c3ba to f35b2bb Compare December 3, 2025 19:32

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from 0e9a430 to aa64f61 Compare December 12, 2025 19:04

nsusanto requested a review from a team December 12, 2025 19:04

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch 2 times, most recently from 5f33013 to 8111ed1 Compare December 12, 2025 19:07

root and others added 8 commits December 15, 2025 18:28

[TRITON] copy a8w4 code + format changes for a4w4

924a4d0

[TRITON] Implement a4w4 moe kernel

5aa7337

[TRITON] Refactor a4w4 moe kernel

8645c13

[TRITON] tune testcase for a4w4 based on deepseek r1 shapes

3dedcd1

[TRITON] tune testcase for a4w4 based on deepseek r1 shapes

0eef72a

[TRITON] refactor activation quant to use deepseek fp4 quant

8d33bdb

[TRITON] reformat with black

4e722eb

[TRITON] skip a4w4 unit tests on MI300

77bfdb1

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from 8111ed1 to 77bfdb1 Compare December 15, 2025 18:33

nsusanto added 2 commits December 15, 2025 18:36

[TRITON] move a4w4 test to moe folder

d0f9486

Merge branch 'main' into nsusanto/moe_gemm_a4w4

232f465

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch 2 times, most recently from 37029fb to 4610e4d Compare December 22, 2025 16:14

[TRITON] Fix formatting issues on a4w4

8353f61

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from 4610e4d to 8353f61 Compare December 22, 2025 16:18

Reformat using black and ruff

90bdc8c

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from f45a0e7 to 0dd7dbe Compare January 6, 2026 21:49

Add moe1/moe2 suffix for easier profiling

e11d836

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from 0dd7dbe to e11d836 Compare January 6, 2026 21:54

Rename moe suffix to use layer1/layer2

748342a

nsusanto added 3 commits January 7, 2026 15:53

Merge branch 'main' into nsusanto/moe_gemm_a4w4

4eb3129

Fix import errors

d3c24b2

Move a4w4 to moe folder

a4a859a

nsusanto force-pushed the nsusanto/moe_gemm_a4w4 branch from 5c11894 to a4a859a Compare January 8, 2026 16:04

nsusanto added 2 commits January 8, 2026 16:15

Remove remove fp4 activation type

6bd1ab5

Fix quantize unpacking in bench_moe_gemm_a8w4

481dac3

lburzawa self-requested a review January 9, 2026 12:40

nsusanto added 2 commits January 9, 2026 18:23

Add --num-weight-inits flag to average MoE benchmark results

3a794fd

Remove test benchmark script

79372a0

lburzawa approved these changes Jan 12, 2026

View reviewed changes

azaidy approved these changes Jan 12, 2026

View reviewed changes

nsusanto merged commit 9eecdec into main Jan 12, 2026
19 checks passed

nsusanto deleted the nsusanto/moe_gemm_a4w4 branch January 12, 2026 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] Add MoE GEMM a4w4 kernel#1358

[TRITON] Add MoE GEMM a4w4 kernel#1358
nsusanto merged 21 commits intomainfrom
nsusanto/moe_gemm_a4w4

nsusanto commented Nov 6, 2025 •

edited

Loading

Uh oh!

azaidy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nsusanto commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Plan

Test Result

Submission Checklist

Uh oh!

azaidy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nsusanto commented Nov 6, 2025 •

edited

Loading