[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale by big-yellow-duck · Pull Request #1829 · ROCm/aiter

big-yellow-duck · 2026-01-13T16:30:08Z

Motivation

This adds preliminary support for gfx1201 to use gemm_a8w8_blockscale from triton which is used in Qwen/Qwen3-0.6B-FP8

Moving forward, more triton kernels can be tuned to optimize the performance of gfx1201.

Technical Details

Added a base tuning script that is adaptable to other operations.
Added a tuning script to tune the triton kernel parameters for gemm_a8w8_blockscale.
the tuning script benchmarks different kernel parameter such as num_warps and waves_per_eu to find the optimal execution time for a set of operations.

Test Plan

test the tuned configs using aiter/op_tests/triton_tests/gemm/basic/test_gemm_a8w8_blockscale.py

pytest op_tests/triton_tests/gemm/basic/test_gemm_a8w8_blockscale.py

Test Result

126 tests have passed
2 skipped, (where N or K don't meet preshuffle kernel constraints: N must be multiple of 16, K must be multiple of 32)

Submission Checklist

[ ✅] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: NAME Amir Balwel amoooori04@gmail.com

Co-authoured-by: Amir Balwel amoooori04@gmail.com

…aiter into support_gfx1201_min

Co-authored-by: Amir Balwel <amoooori04@gmail.com>

…aiter into support_gfx1201_min

Co-authored-by: Jeff Aw <jeffaw99@hotmail.com> Signed-off-by: Amir Balwel <amoooori04@gmail.com>

Co-authored-by: Amir Balwel <amoooori04@gmail.com>

…aiter into support_gfx1201_min

…aiter into support_gfx1201_min Co-authored-by: Amir Balwel amoooori04@gmail.com

…il.com>" This reverts commit 879c2c5.

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

big-yellow-duck · 2026-01-28T06:37:22Z

Using aiter gemm_w8a8 kernels in vllm shows performance uplift for gfx1201 when running Qwen3-0.6B-FP8 at higher input and ouput tokens,

Signed-off-by: Amir Balwel <amoooori04@gmail.com>

big-yellow-duck and others added 14 commits January 5, 2026 08:07

added tuned gemms for r9700

e0c5114

Merge branch 'ROCm:main' into main

e532f3a

Merge branch 'ROCm:main' into main

1a286e8

Added gemm_a8w8_blockscale support for gfx1201 with tuning script

bdab40d

Co-authored-by: NAME Amir Balwel amoooori04@gmail.com

Merge branch 'main' into support_gfx1201_min

c7664b8

added gfx1201 to types.py

c162331

Co-authoured-by: Amir Balwel amoooori04@gmail.com

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

fd925f4

…aiter into support_gfx1201_min

Merge branch 'ROCm:main' into support_gfx1201_min

2afe833

Merge branch 'ROCm:main' into support_gfx1201_min

a9f329a

Co-authored-by: Amir Balwel <amoooori04@gmail.com>

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

0ef32a3

…aiter into support_gfx1201_min

Add readme file and rename base to utils

897fd62

Co-authored-by: Jeff Aw <jeffaw99@hotmail.com> Signed-off-by: Amir Balwel <amoooori04@gmail.com>

add fp8 dtype

622fd33

Co-authored-by: Jeff Aw <jeffaw99@hotmail.com> Signed-off-by: Amir Balwel <amoooori04@gmail.com>

added gemm_a8w8_blocscale_shuffle

ab93b43

Co-authored-by: Amir Balwel <amoooori04@gmail.com>

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

5ff3029

…aiter into support_gfx1201_min

big-yellow-duck changed the title ~~Support gfx1201 min~~ Support gfx1201 for triton gemm_a8w8_blockscale Jan 16, 2026

big-yellow-duck and others added 10 commits January 16, 2026 10:30

Merge branch 'main' into support_gfx1201_min

7f09f13

update tuned gemm_a8w8_blockscale

aea5797

Merge branch 'support_gfx1201_min' of https://github.com/EmbeddedLLM/…

60ee427

…aiter into support_gfx1201_min Co-authored-by: Amir Balwel amoooori04@gmail.com

Merge branch 'main' into support_gfx1201_min

1d53884

Add readme for tuning Co-authored-by: Jeff Aw <jeffaw99@hotmail.com>

879c2c5

Add readme for tuning Co-authored-by: Jeff Aw <jeffaw99@hotmail.com>

c285687

update tuning readme

05f9ea7

Revert "Add readme for tuning Co-authored-by: Jeff Aw <jeffaw99@hotma…

971dcd8

…il.com>" This reverts commit 879c2c5.

Merge branch 'ROCm:main' into main

b645bcb

rebase and revert the submodule changes

47bba80

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

big-yellow-duck marked this pull request as ready for review January 23, 2026 02:50

big-yellow-duck requested a review from a team January 23, 2026 02:50

azaidy changed the title ~~Support gfx1201 for triton gemm_a8w8_blockscale~~ [TRITON] Support gfx1201 for triton gemm_a8w8_blockscale Jan 23, 2026

azaidy requested review from azaidy and vgokhale January 23, 2026 03:29

tjtanaa and others added 2 commits January 27, 2026 19:10

Merge branch 'main' into support_gfx1201_min

bae3609

Merge branch 'ROCm:main' into main

5f0e3b4

tjtanaa and others added 3 commits January 30, 2026 20:54

Merge branch 'ROCm:main' into main

3f8fe9b

Merge remote-tracking branch 'origin/main' into support_gfx1201_min

1c4510e

Signed-off-by: Amir Balwel <amoooori04@gmail.com>

Merge branch 'main' into support_gfx1201_min

cc1530f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale#1829

[TRITON] Support gfx1201 for triton gemm_a8w8_blockscale#1829
big-yellow-duck wants to merge 29 commits intoROCm:mainfrom
EmbeddedLLM:support_gfx1201_min

big-yellow-duck commented Jan 13, 2026 •

edited

Loading

Uh oh!

big-yellow-duck commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

big-yellow-duck commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

big-yellow-duck commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

big-yellow-duck commented Jan 13, 2026 •

edited

Loading