Support page kvcache in AMD ROCm #1198

rocking5566 · 2024-09-03T05:25:39Z

In this PR,

Update the ROCm backend (CK), so I modify how to call ck due to changing of CK api.
Improve backward performance by updating the CK (1)
Implement mha_fwd_kvcache().
Change of compile flag to support ROCm6.2
Change bf16 rounding to RTN (round to nearest)

pass all the test in MI200 & MI300 with ROCm6.2 (skip some test case in kvcache, follow the original test)

…st_flash_attn_bwd_overflow, test_flash_attn_bwd_transpose, test_flash_attn_bwd_varlen_overflow, test_flash_attn_deterministic, test_flash_attn_varlen_deterministic

…ve_v0.1.1

Improve FMHA bwd

…kvcache

Ck tile/kvcache

Sync Ck tile compile flag with rocm6.2

Change rounding of bf16 to rtn

joellliu · 2024-09-09T05:12:37Z

I have tested this PR on MI300. I saw increased accuracy compared to previous versions of ROCm flash attention. My training of OLMo-1B model with flash attention is finally able to converge!

ehartford · 2024-09-09T18:43:44Z

This change will greatly help us @tridao
Thank you!

rocking5566 and others added 30 commits July 23, 2024 16:21

Integrate ck branch of ck_tile/fa_bwd_opt

b5a4204

Assume dq and q share the same stride

1e3416d

update ck

bc78de1

Integrate more stride of dq_acc

46d1cff

Revert fwd dropout

eac0e38

Fix paremeter order

b14f245

Integrate ck with more stride

3180632

update the limit of hdim of bwd

c7ac11f

Check argument

6af216c

Add test_flash_attn_causal

05b657e

Support unpad lse

dbe28cb

Add test_flash_attn_varlen_causal, test_flash_attn_race_condition, te…

7b712b2

…st_flash_attn_bwd_overflow, test_flash_attn_bwd_transpose, test_flash_attn_bwd_varlen_overflow, test_flash_attn_deterministic, test_flash_attn_varlen_deterministic

Fix stride and Kn0

5346b5b

Fix CK sync issue

381bbdd

Fix typo

6ac697c

Merge commit '3669b25206d5938e3cc74a5f7860e31c38af8204' into ck_impro…

bc5dd34

…ve_v0.1.1

Update CK for changing of fmha_fwd_args

909d66f

Add kvcache tmp

6928524

Add kvcache

23000f7

Fix comment

2c80b86

Sync behavior with ck

d2ed413

Update CK to develop

c0c2f8f

remove large test case

e037142

Merge pull request #70 from ROCm/ck_bwd_opt

d38c59b

Improve FMHA bwd

Merge remote-tracking branch 'origin/ck_improve_v0.1.1' into ck_tile/…

a37da96

…kvcache

Add kvcache test

d8de7a6

Fix page_block_size in arg

a84d12d

Minor fix

5b4546c

Fix stride error

d6aac9e

Update seqlen of kvcache before splitkv

ae24800

rocking5566 and others added 22 commits August 21, 2024 12:58

Fix compile error

e2d3f5b

Fix bug of hdim is not 8x

bb7a439

Fit ck arg

7b18b87

support adaptive num_splits

94e054f

add more tests

9316aa6

Refine test tolerance

7815e3b

update CK

27095f2

Move override_num_splits_if_necessary into cpp

4a25f60

update ck

22eee22

Update ck

007ae03

Merge pull request #74 from ROCm/ck_tile/kvcache

7259227

Ck tile/kvcache

Merge branch 'Dao-AILab:main' into ck_improve_v0.1.1

444ab9f

Support different flag for different version of hip

4c0f9d2

remove coerce-illegal, becasue this is not required in FA

89ac30b

Update ck to fix xcratch memory

a381df5

Add coerce-illegal in some version

6635e24

Merge pull request #77 from ROCm/ck_tile/rocm6.2-flag

cc01a17

Sync Ck tile compile flag with rocm6.2

Add compile flag for rtn rounding

1cb8f8d

remove redundant init

ba86d74

Using env var to switch rounding mode

8c4f9cd

update ck

ece97c7

Merge pull request #78 from ROCm/ck_tile/bf16_rtn

b40c1a0

Change rounding of bf16 to rtn

rocking5566 changed the title ~~Support kvcache in ROCm~~ Support kvcache in AMD ROCm Sep 9, 2024

rocking5566 changed the title ~~Support kvcache in AMD ROCm~~ Support page + kvcache in AMD ROCm Sep 9, 2024

rocking5566 changed the title ~~Support page + kvcache in AMD ROCm~~ Support page kvcache in AMD ROCm Sep 9, 2024

tridao merged commit e2182cc into Dao-AILab:main Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support page kvcache in AMD ROCm #1198

Support page kvcache in AMD ROCm #1198

rocking5566 commented Sep 3, 2024 •

edited

Loading

joellliu commented Sep 9, 2024 •

edited

Loading

ehartford commented Sep 9, 2024

Support page kvcache in AMD ROCm #1198

Support page kvcache in AMD ROCm #1198

Conversation

rocking5566 commented Sep 3, 2024 • edited Loading

joellliu commented Sep 9, 2024 • edited Loading

ehartford commented Sep 9, 2024

rocking5566 commented Sep 3, 2024 •

edited

Loading

joellliu commented Sep 9, 2024 •

edited

Loading