Skip to content

Comments

[Triton] GEMM tunning script#1833

Open
k50112113 wants to merge 27 commits intoROCm:mainfrom
k50112113:shaoclee/triton_gemm_tunning
Open

[Triton] GEMM tunning script#1833
k50112113 wants to merge 27 commits intoROCm:mainfrom
k50112113:shaoclee/triton_gemm_tunning

Conversation

@k50112113
Copy link
Contributor

This PR adds GEMM tunning scripts that

  1. screen through parameter space and select the most performant config
  2. generate JSON config files
  3. verify performance results

@k50112113 k50112113 requested review from a team and azaidy January 13, 2026 22:11
@cagrikymk cagrikymk self-requested a review January 14, 2026 19:39
@fxmarty-amd
Copy link
Contributor

fxmarty-amd commented Jan 26, 2026

This is very useful and works! I am able to get significant speedups in certain configurations that are not by default tuned.

Do you have a recommendation to verify correctness compared to using the default triton config? e.g. as compared to the default https://github.com/ROCm/aiter/blob/main/aiter/ops/triton/configs/gemm/gfx950-GEMM-AFP4WFP4_PRESHUFFLED.json for mxfp4 preshuffled?

edit: using https://github.com/ROCm/aiter/blob/main/op_tests/triton_tests/gemm/basic/test_gemm_afp4wfp4.py works

@k50112113
Copy link
Contributor Author

Do you have a recommendation to verify correctness compared to using the default triton config? e.g. as compared to the default https://github.com/ROCm/aiter/blob/main/aiter/ops/triton/configs/gemm/gfx950-GEMM-AFP4WFP4_PRESHUFFLED.json for mxfp4 preshuffled?

For new shapes that you tune, the JSON file name would be gfx950-GEMM-AFP4WFP4_PRESHUFFLED-N={N}-K={2*K}.json in the case of mxfp4 GEMM, so we don't have to replace the default JSON file (the one without N and K sufix), and of course, once you have generated those JSON files, you can add them to aiter/ops/triton/configs/gemm/ and push a PR for us to review.

edit: using https://github.com/ROCm/aiter/blob/main/op_tests/triton_tests/gemm/basic/test_gemm_afp4wfp4.py works

Yes, we recommend that you use the test_gemm_afp4wfp4.py scripts once you have added/modified your JSON file to aiter/ops/triton/configs/gemm/ if test_gemm_afp4wfp4 does not contain the shape you tunned, please add your shape at test_gemm_afp4wfp4::get_x_vals() and run the test script locally before pushing the PR. If you see that the test fails, please push your changes to a branch and contact us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants