[CK TILE] GEMM and Batched GEMM SplitK support #1724

bartekxk · 2024-12-06T01:12:49Z

No description provided.

…ocot/tile-splitk

aosewski

Good work! However I have few things for reconsideration.

include/ck_tile/ops/gemm/pipeline/gemm_pipeline_ag_bg_cr_comp_v3.hpp

include/ck_tile/ops/gemm/kernel/gemm_kernel.hpp

aosewski · 2024-12-06T10:18:18Z

include/ck_tile/ops/epilogue/cshuffle_epilogue.hpp

+              memory_operation_enum out_memory_data_op = memory_operation_enum::set>
    CK_TILE_DEVICE auto operator()(ODramWindowTmp& o_dram_window_tmp, OAccTile& o_acc_tile)


I think if you could add third function parameter memory_operation_enum o_mem_data_op = out_memory_data_op ? Then you wouldn't have to pass all template parameters, but just pass memory op if you need different one than default.

I think it is not possible to pass it because it is enum. Then you cannot compare object from argument in if constexpr

@bartekxk You're right. What about having a store_tile API which get's as as last paramter memory operation enum ? And for set it will do store while for atomic_add and others it will do update ?

We can do it at now there are store_tile and update_tile, so it is just concept

include/ck_tile/ops/gemm/kernel/gemm_kernel.hpp

example/ck_tile/03_gemm/gemm_basic.hpp

include/ck_tile/ops/gemm/kernel/gemm_kernel.hpp

aosewski · 2024-12-11T09:20:36Z

include/ck_tile/ops/epilogue/cshuffle_epilogue.hpp

@@ -158,12 +167,26 @@ struct CShuffleEpilogue
        // Store the tile data to the permuted location
        if constexpr(kPadM || kPadN)


@bartekxk By the way do we really need here this check? The *_raw version of tile API just does things using assembly... I'm not sure if we really need it here. The plain tile API should work as well regardless of padding.

Are you sure we need it here? Looks like it could improve performance like for example here: #1752

…ocot/tile-splitk

carlushuang · 2024-12-23T11:04:33Z

example/ck_tile/03_gemm/gemm_basic.hpp

@@ -54,8 +54,7 @@ using CDataType   = Types::CDataType;
 auto create_args(int argc, char* argv[])
 {
    ck_tile::ArgParser arg_parser;
-    arg_parser.insert("b", "1", "batch size")
-        .insert("m", "3840", "m dimension")
+    arg_parser.insert("m", "3840", "m dimension")


do we not supporting batch (b) in this example?

This is simple gemm, not batched.

carlushuang · 2024-12-23T11:06:50Z

example/ck_tile/03_gemm/universal_gemm.cpp

@@ -78,7 +78,9 @@ float gemm_calc(const gemm_basic_args& args, const ck_tile::stream_config& s)
 #endif
        ck_tile::GemmPipelineProblem<ADataType, BDataType, AccDataType, GemmShape, Traits>>;

-    const ck_tile::index_t num_loop    = TilePartitioner::GetLoopNum(args.K);
+    const ck_tile::index_t k_grain     = args.k_batch * K_Tile;


is it true that if we set the split_k=1 from cmd arg, the kernel will run only K_Tile for each kernel's unroll? what about if we want to disable split-k from cmd args, is it through split_k=0? or not considered?

I think this is analogus to just round up K dimension in the case of split_k=1

carlushuang

LGTM

bartekxk added 3 commits December 5, 2024 18:04

[CK TILE] Add split K support in GEMM

13707b4

Merge branch 'develop' of github.com:ROCm/composable_kernel into bark…

7f523ec

…ocot/tile-splitk

Updates

6f677a8

bartekxk self-assigned this Dec 6, 2024

bartekxk requested review from junliume, illsilin, carlushuang, qianfengz, aosewski, poyenc, geyyer and andriy-ca as code owners December 6, 2024 01:12

aosewski reviewed Dec 6, 2024

View reviewed changes

Fixes

9ad07c7

aosewski reviewed Dec 11, 2024

View reviewed changes

bartekxk added 6 commits December 22, 2024 10:58

Merge branch 'develop' of github.com:ROCm/composable_kernel into bark…

b64e852

…ocot/tile-splitk

rebase

86d7bc1

fix

4a7f78d

Fix

d1dc19d

fixes

bce0f24

support for batched gemm

d1d7909

bartekxk changed the title ~~[CK TILE] GEMM SplitK support~~ [CK TILE] GEMM and Batched GEMM SplitK support Dec 22, 2024

carlushuang reviewed Dec 23, 2024

View reviewed changes

bartekxk requested a review from carlushuang December 23, 2024 11:20

carlushuang approved these changes Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK TILE] GEMM and Batched GEMM SplitK support #1724

[CK TILE] GEMM and Batched GEMM SplitK support #1724

bartekxk commented Dec 6, 2024

aosewski left a comment

aosewski Dec 6, 2024

bartekxk Dec 7, 2024

aosewski Dec 11, 2024 •

edited

Loading

bartekxk Dec 22, 2024

aosewski Dec 11, 2024 •

edited

Loading

bartekxk Dec 22, 2024

carlushuang Dec 23, 2024

bartekxk Dec 23, 2024

carlushuang Dec 23, 2024

bartekxk Dec 23, 2024

carlushuang left a comment

		memory_operation_enum out_memory_data_op = memory_operation_enum::set>
		CK_TILE_DEVICE auto operator()(ODramWindowTmp& o_dram_window_tmp, OAccTile& o_acc_tile)

		@@ -158,12 +167,26 @@ struct CShuffleEpilogue
		// Store the tile data to the permuted location
		if constexpr(kPadM \|\| kPadN)

[CK TILE] GEMM and Batched GEMM SplitK support #1724

Are you sure you want to change the base?

[CK TILE] GEMM and Batched GEMM SplitK support #1724

Conversation

bartekxk commented Dec 6, 2024

aosewski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aosewski Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aosewski Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carlushuang left a comment

Choose a reason for hiding this comment

aosewski Dec 11, 2024 •

edited

Loading

aosewski Dec 11, 2024 •

edited

Loading