Skip to content

[TRITON] gluon gemm_afp4wfp4#1809

Merged
ahmed-bsod merged 5 commits intomainfrom
ahmed-bsod/gluon_gemm_afp4wfp4
Jan 21, 2026
Merged

[TRITON] gluon gemm_afp4wfp4#1809
ahmed-bsod merged 5 commits intomainfrom
ahmed-bsod/gluon_gemm_afp4wfp4

Conversation

@ahmed-bsod
Copy link
Contributor

@ahmed-bsod ahmed-bsod commented Jan 11, 2026

Motivation

Added gluon kernels for GEMM_AFP4FP4

Perf

Tuned for M=N=K=8192
Running the bench script python bench_gemm_afp4wfp4.py --shape 8192 8192 8192 --gluon
gives a performance of 2500ish tflops on avg

Testing

Passes all the shapes in test_gemm_afp4wfp4.py
Also tested the _gemm_afp4wfp4_reduce_kernel by setting k_split to 2, 4, 8 in config file

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Gluon kernel implementations for GEMM_AFP4WFP4 (GEMM with FP4 activations and FP4 weights) operations, specifically targeting AMD CDNA4 (gfx950) devices. The implementation includes a new Gluon-based kernel, test infrastructure updates to support both Triton and Gluon implementations, and benchmark support.

Changes:

  • Adds new Gluon kernel implementation for AFP4WFP4 GEMM operations with split-K support
  • Updates tests to parameterize implementation choice between Triton and Gluon variants
  • Adds benchmark support for performance comparison between implementations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
aiter/ops/triton/gluon/gemm_afp4wfp4.py New Gluon kernel implementation with main GEMM kernel, reduce kernel, and configuration loading
aiter/ops/triton/configs/gemm/gluon/gfx950-GEMM-AFP4WFP4.json Configuration file with tuning parameters for gfx950 architecture
op_tests/triton_tests/gemm/basic/test_gemm_afp4wfp4.py Updated tests to support both Triton and Gluon implementations via parameterization
op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py Added Gluon implementation option to benchmark suite

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 3770da4 to 51aafa6 Compare January 11, 2026 23:57
cagrikymk
cagrikymk previously approved these changes Jan 13, 2026
@cagrikymk
Copy link
Contributor

Besides the nitpicking, LGTM!

lburzawa
lburzawa previously approved these changes Jan 17, 2026
Copy link
Contributor

@lburzawa lburzawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 51aafa6 to 6538132 Compare January 20, 2026 14:14
@ahmed-bsod ahmed-bsod dismissed stale reviews from cagrikymk and lburzawa via 7a6d131 January 20, 2026 14:29
@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch 2 times, most recently from ac77116 to 798cbc9 Compare January 20, 2026 16:28
@ahmed-bsod ahmed-bsod force-pushed the ahmed-bsod/gluon_gemm_afp4wfp4 branch from 798cbc9 to 389d342 Compare January 21, 2026 14:39
@azaidy azaidy requested review from cagrikymk and lburzawa January 21, 2026 15:19
Copy link
Contributor

@azaidy azaidy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ahmed-bsod ahmed-bsod merged commit ef827c7 into main Jan 21, 2026
24 of 25 checks passed
@ahmed-bsod ahmed-bsod deleted the ahmed-bsod/gluon_gemm_afp4wfp4 branch January 21, 2026 22:52
gyohuangxin pushed a commit that referenced this pull request Jan 22, 2026
* added gluon kernel for gemm_afp4wfp4

* updated the gemm_afp4wfp4 test and bench scripts to support running the kernel
yzhou103 pushed a commit that referenced this pull request Jan 28, 2026
* added gluon kernel for gemm_afp4wfp4

* updated the gemm_afp4wfp4 test and bench scripts to support running the kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants