Added instr.sched options to `tune_gemm.py` #649

ravil-mobile · 2024-10-15T10:57:59Z

No description provided.

xiaohuguo2023 · 2024-10-15T21:52:17Z

Can you add some description in the README file ?

Hi @xiaohuguo2023, I added the following line under GEMM Tuning Script v3.4:

Appended the parameters tuning space with instruction scheduling variants for the main gemm-loop (k-loop).

zhanglx13 · 2024-10-16T04:00:07Z

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

ravil-mobile · 2024-10-24T12:56:37Z

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

sjw36

Looks good but need to wait for features in TIP. Blocking for now.

ravil-mobile · 2024-10-29T15:48:05Z

Looks good but need to wait for features in TIP. Blocking for now.

Sorry. I didn't get what you mean

zhanglx13 · 2024-10-30T04:05:38Z

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

If this is the case, I'd say we don't want to add more tuning parameters.
If we know it works for "some" kernels for sure, we should use them by some simple heuristics.
If we don't know when they help, we should figure it out first.

zhanglx13

See my comments

ravil-mobile · 2024-10-30T13:29:37Z

Do we know which gemm sizes can potentially benefit from setting instruction_sched_variant to iglp0 or ckv3?

No, at this moment it is difficult to say

If this is the case, I'd say we don't want to add more tuning parameters. If we know it works for "some" kernels for sure, we should use them by some simple heuristics. If we don't know when they help, we should figure it out first.

Ok. Let me make sched_variants = ["\"default\""]. This means that only no scheduling variant is going to work. Tell me whether you are ok with it. At least we should push the adapted infrastructure to main_perf

zhanglx13 · 2024-10-30T13:50:29Z

python/perf-kernels/tools/tune_gemm/tune_gemm.py


        bestConfig_compact_str = gen_configStr(bestConfig)
        if not run_bench:
-            print(f'best_config: {bestConfig_compact_str}', end=" ", flush=True)
+            print(f'\nbest_config: {bestConfig_compact_str}', end=" ", flush=True)


Do you have an example output after adding '\n'?

Yes, sure.

> ./tune_gemm.py --gemm_size_file ~/tuning/input.yaml --gpu_ids 3,4,5 --jobs 32 --o ~/tuning/output.yaml Tuning 1 gemm sizes starts at: 2024-10-29 14:26:32.618604 SIZE: 4864 8192 4160 TN nConfigs: 720 TFLOPS: 516.47; time(us): 641.89 best_config: BM128_BN128_BK64_GM8_SK1_nW4_nS2_EU0_kP2_mfma16_schedDEFAULT >>> Elapsed time: 0:04:11.238153 = 0:00:20.441773 (compile) + 0:03:49.947198 (profile) + 0:00:00.681876 (post processing) Tuning ends at: 2024-10-29 14:30:44.031012 Total tuning time (h:m:s): 0:04:11.412408

zhanglx13 · 2024-10-30T13:51:57Z

python/perf-kernels/tools/tune_gemm/tune_gemm.py

-                                                num_warps, 'num_stages': num_stages, 'waves_per_eu': waves_per_eu,
-                                                'matrix_instr_nonkdim': matrix_instr_nonkdim, 'kpack': kpack
-                                            })
+                                        for sched_variant in sched_variants:


Since you are here, can you replace the nested for loops with itertools.product?

Yes, sure. Done!

zhanglx13 · 2024-10-30T13:52:32Z

Ok. Let me make sched_variants = [""default""]. This means that only no scheduling variant is going to work. Tell me whether you are ok with it. At least we should push the adapted infrastructure to main_perf

Yes, I'm ok with it

zhanglx13 · 2024-11-20T21:38:01Z

python/perf-kernels/tools/tune_gemm/utils/file_generator.py

@@ -112,6 +113,7 @@ def matmul_{configStr}(M, N, K, am, ak, bk, bn, cm, cn, biasn):
        EVEN_K = {EVEN_K},
        GRID_MN = grid_mn,
        NUM_XCDS = {num_xcds},
+        instruction_sched_variant = {sched_variant},


Missing quotes ' '

zhanglx13 · 2024-11-20T21:38:21Z

python/perf-kernels/tools/tune_gemm/utils/file_generator.py

@@ -145,7 +147,8 @@ def matmul_{configStr}(a, b, c, bias, M, N, K, am, ak, bk, bn, cm, cn, biasn):
        BIAS = {use_bias},
        EVEN_K = {EVEN_K},
        GRID_MN = grid[0],
-        NUM_XCDS = {num_xcds}
+        NUM_XCDS = {num_xcds},
+        instruction_sched_variant = {sched_variant},


Missing quote ' '

zhanglx13 · 2024-11-20T21:39:15Z

python/perf-kernels/tools/tune_gemm/utils/file_generator.py

        config)

    ## {M}_{N}_{K} is removed since the same kernel can be used for differen gemm sizes
-    configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}"
+    configStr = f"BM{block_m}_BN{block_n}_BK{block_k}_GM{group_m}_SK{split_k}_nW{num_warps}_nS{num_stages}_EU{waves_per_eu}_kP{kpack}_mfma{mfmaInstrSize}_sched{sched_variant[1:-1].upper()}"


Now we are using local-prefetch, but we cannot have - in kernel names. Can you also convert - into _?

zhanglx13

See my comments

sjw36

Just one question. Otherwise LGTM.

sjw36 · 2024-11-26T16:00:32Z

python/perf-kernels/tools/tune_gemm/test_regression.py

@@ -101,6 +101,7 @@ def teardown_class(self):
        },
    ], ids=lambda val: f"Config: {val}")
    def test_matmul_performance_regression(self, config, record_property):
+        config.setdefault('instruction_sched_variant', 'default')


should be none?

Thanks! I missed it somehow.

ravil-mobile force-pushed the ravil/main_perf branch from a7e82e6 to 932aef2 Compare October 15, 2024 10:59

ravil-mobile requested review from zhanglx13 and xiaohuguo2023 October 15, 2024 11:00

ravil-mobile force-pushed the ravil/main_perf branch 2 times, most recently from 66eb96c to 380970d Compare October 24, 2024 12:55

ravil-mobile force-pushed the ravil/main_perf branch from 380970d to 95a3c3f Compare October 24, 2024 13:01

micmelesse force-pushed the main_perf branch from 16b0bbf to 628e09b Compare October 28, 2024 15:11

ravil-mobile force-pushed the ravil/main_perf branch 2 times, most recently from 76dbf3e to 42bca31 Compare October 29, 2024 11:33

sjw36 requested changes Oct 29, 2024

View reviewed changes

zhanglx13 requested changes Oct 30, 2024

View reviewed changes

zhanglx13 reviewed Oct 30, 2024

View reviewed changes

ravil-mobile force-pushed the ravil/main_perf branch from 42bca31 to 5230674 Compare October 30, 2024 14:31

ravil-mobile requested review from zhanglx13 and sjw36 October 30, 2024 15:12

ravil-mobile force-pushed the ravil/main_perf branch 4 times, most recently from df11ba0 to df063d4 Compare November 4, 2024 18:20

zhanglx13 reviewed Nov 20, 2024

View reviewed changes

zhanglx13 requested changes Nov 20, 2024

View reviewed changes

ravil-mobile and others added 4 commits November 25, 2024 18:10

Added instr.sched options to tune_gemm.py

623a704

Changed default num_stages in tune_gemm.py

a010ec9

Add instruction sched variant to regression pipeline config

7edf5c7

Added tl.assume clauses for all gemm-kernel strides

88d06ec

ravil-mobile force-pushed the ravil/main_perf branch 5 times, most recently from 9f67cb2 to 71f945b Compare November 26, 2024 10:49

ravil-mobile requested a review from zhanglx13 November 26, 2024 10:50

sjw36 approved these changes Nov 26, 2024

View reviewed changes

Replaced nested for-loops in tune_gemm.py with itertools.product

c0ff468

ravil-mobile force-pushed the ravil/main_perf branch from 71f945b to c0ff468 Compare November 26, 2024 16:23

ravil-mobile requested a review from sjw36 November 26, 2024 16:24

zhanglx13 approved these changes Nov 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added instr.sched options to `tune_gemm.py` #649

Added instr.sched options to `tune_gemm.py` #649

ravil-mobile commented Oct 15, 2024

xiaohuguo2023 commented Oct 15, 2024 •

edited by ravil-mobile

Loading

zhanglx13 commented Oct 16, 2024

ravil-mobile commented Oct 24, 2024

sjw36 left a comment

ravil-mobile commented Oct 29, 2024

zhanglx13 commented Oct 30, 2024

zhanglx13 left a comment

ravil-mobile commented Oct 30, 2024

zhanglx13 Oct 30, 2024

ravil-mobile Oct 30, 2024

zhanglx13 Oct 30, 2024

ravil-mobile Oct 30, 2024

zhanglx13 commented Oct 30, 2024

zhanglx13 Nov 20, 2024

zhanglx13 Nov 20, 2024

ravil-mobile Nov 26, 2024

zhanglx13 Nov 20, 2024

ravil-mobile Nov 26, 2024

zhanglx13 left a comment

sjw36 left a comment

sjw36 Nov 26, 2024

ravil-mobile Nov 26, 2024

ravil-mobile Nov 26, 2024

Added instr.sched options to tune_gemm.py #649

Are you sure you want to change the base?

Added instr.sched options to tune_gemm.py #649

Conversation

ravil-mobile commented Oct 15, 2024

xiaohuguo2023 commented Oct 15, 2024 • edited by ravil-mobile Loading

zhanglx13 commented Oct 16, 2024

ravil-mobile commented Oct 24, 2024

sjw36 left a comment

Choose a reason for hiding this comment

ravil-mobile commented Oct 29, 2024

zhanglx13 commented Oct 30, 2024

zhanglx13 left a comment

Choose a reason for hiding this comment

ravil-mobile commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanglx13 left a comment

Choose a reason for hiding this comment

sjw36 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Added instr.sched options to `tune_gemm.py` #649

Added instr.sched options to `tune_gemm.py` #649

xiaohuguo2023 commented Oct 15, 2024 •

edited by ravil-mobile

Loading