Conversation
|
This is very useful and works! I am able to get significant speedups in certain configurations that are not by default tuned. Do you have a recommendation to verify correctness compared to using the default triton config? e.g. as compared to the default https://github.com/ROCm/aiter/blob/main/aiter/ops/triton/configs/gemm/gfx950-GEMM-AFP4WFP4_PRESHUFFLED.json for mxfp4 preshuffled? edit: using https://github.com/ROCm/aiter/blob/main/op_tests/triton_tests/gemm/basic/test_gemm_afp4wfp4.py works |
For new shapes that you tune, the JSON file name would be
Yes, we recommend that you use the |
This PR adds GEMM tunning scripts that