[TRITON] gluon gemm_afp4wfp4 by ahmed-bsod · Pull Request #1809 · ROCm/aiter

ahmed-bsod · 2026-01-11T23:48:15Z

Motivation

Added gluon kernels for GEMM_AFP4FP4

Perf

Tuned for M=N=K=8192
Running the bench script python bench_gemm_afp4wfp4.py --shape 8192 8192 8192 --gluon
gives a performance of 2500ish tflops on avg

Testing

Passes all the shapes in test_gemm_afp4wfp4.py
Also tested the _gemm_afp4wfp4_reduce_kernel by setting k_split to 2, 4, 8 in config file

Copilot

Pull request overview

This PR adds Gluon kernel implementations for GEMM_AFP4WFP4 (GEMM with FP4 activations and FP4 weights) operations, specifically targeting AMD CDNA4 (gfx950) devices. The implementation includes a new Gluon-based kernel, test infrastructure updates to support both Triton and Gluon implementations, and benchmark support.

Changes:

Adds new Gluon kernel implementation for AFP4WFP4 GEMM operations with split-K support
Updates tests to parameterize implementation choice between Triton and Gluon variants
Adds benchmark support for performance comparison between implementations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
aiter/ops/triton/gluon/gemm_afp4wfp4.py	New Gluon kernel implementation with main GEMM kernel, reduce kernel, and configuration loading
aiter/ops/triton/configs/gemm/gluon/gfx950-GEMM-AFP4WFP4.json	Configuration file with tuning parameters for gfx950 architecture
op_tests/triton_tests/gemm/basic/test_gemm_afp4wfp4.py	Updated tests to support both Triton and Gluon implementations via parameterization
op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py	Added Gluon implementation option to benchmark suite

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

aiter/ops/triton/gluon/gemm_afp4wfp4.py

cagrikymk · 2026-01-13T15:39:37Z

Besides the nitpicking, LGTM!

lburzawa

Good work!

also changed waves_per_eu to 0 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

azaidy

LGTM!

* added gluon kernel for gemm_afp4wfp4 * updated the gemm_afp4wfp4 test and bench scripts to support running the kernel

ahmed-bsod requested review from a team and Copilot January 11, 2026 23:48

Copilot started reviewing on behalf of ahmed-bsod January 11, 2026 23:49 View session

Copilot AI reviewed Jan 11, 2026

View reviewed changes

aiter/ops/triton/gluon/gemm_afp4wfp4.py Outdated Show resolved Hide resolved

aiter/ops/triton/gluon/gemm_afp4wfp4.py Show resolved Hide resolved

aiter/ops/triton/gluon/gemm_afp4wfp4.py Show resolved Hide resolved

aiter/ops/triton/gluon/gemm_afp4wfp4.py Outdated Show resolved Hide resolved

ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 3770da4 to 51aafa6 Compare January 11, 2026 23:57

ahmed-bsod requested review from cagrikymk and lburzawa January 12, 2026 16:04

cagrikymk previously approved these changes Jan 13, 2026

View reviewed changes

aiter/ops/triton/gluon/gemm_afp4wfp4.py Outdated Show resolved Hide resolved

lburzawa previously approved these changes Jan 17, 2026

View reviewed changes

ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 51aafa6 to 6538132 Compare January 20, 2026 14:14

ahmed-bsod dismissed stale reviews from cagrikymk and lburzawa via 7a6d131 January 20, 2026 14:29

ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch 2 times, most recently from ac77116 to 798cbc9 Compare January 20, 2026 16:28

ahmed-bsod and others added 5 commits January 21, 2026 09:39

afp4wfp4 gemm without split_k. need to add split_k handling next

22012e5

added reduce kernel. failing for some shapes need to debug that

3ef8ccd

fixes

0798fcb

added gluon kernel to testing and bench scripts

7769e4d

minor fix: changing the type hint for config from string to dict

389d342

also changed waves_per_eu to 0 Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 798cbc9 to 389d342 Compare January 21, 2026 14:39

azaidy requested review from cagrikymk and lburzawa January 21, 2026 15:19

azaidy approved these changes Jan 21, 2026

View reviewed changes

lburzawa approved these changes Jan 21, 2026

View reviewed changes

cagrikymk approved these changes Jan 21, 2026

View reviewed changes

ahmed-bsod merged commit ef827c7 into main Jan 21, 2026
24 of 25 checks passed

ahmed-bsod deleted the ahmed-bsod/gluon_gemm_afp4wfp4 branch January 21, 2026 22:52

gyohuangxin pushed a commit that referenced this pull request Jan 22, 2026

[TRITON] gluon gemm_afp4wfp4 (#1809)

77e3a20

* added gluon kernel for gemm_afp4wfp4 * updated the gemm_afp4wfp4 test and bench scripts to support running the kernel

yzhou103 pushed a commit that referenced this pull request Jan 28, 2026

[TRITON] gluon gemm_afp4wfp4 (#1809)

4764c26

* added gluon kernel for gemm_afp4wfp4 * updated the gemm_afp4wfp4 test and bench scripts to support running the kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] gluon gemm_afp4wfp4#1809

[TRITON] gluon gemm_afp4wfp4#1809
ahmed-bsod merged 5 commits intomainfrom
ahmed-bsod/gluon_gemm_afp4wfp4

ahmed-bsod commented Jan 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cagrikymk commented Jan 13, 2026

Uh oh!

lburzawa left a comment

Uh oh!

azaidy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ahmed-bsod commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Perf

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cagrikymk commented Jan 13, 2026

Uh oh!

lburzawa left a comment

Choose a reason for hiding this comment

Uh oh!

azaidy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ahmed-bsod commented Jan 11, 2026 •

edited

Loading