Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support page kvcache in AMD ROCm #1198

Merged
merged 52 commits into from
Sep 16, 2024
Merged

Conversation

rocking5566
Copy link
Contributor

@rocking5566 rocking5566 commented Sep 3, 2024

In this PR,

  1. Update the ROCm backend (CK), so I modify how to call ck due to changing of CK api.
  2. Improve backward performance by updating the CK (1)
  3. Implement mha_fwd_kvcache().
  4. Change of compile flag to support ROCm6.2
  5. Change bf16 rounding to RTN (round to nearest)

pass all the test in MI200 & MI300 with ROCm6.2 (skip some test case in kvcache, follow the original test)
image

@joellliu
Copy link

joellliu commented Sep 9, 2024

I have tested this PR on MI300. I saw increased accuracy compared to previous versions of ROCm flash attention. My training of OLMo-1B model with flash attention is finally able to converge!

@rocking5566 rocking5566 changed the title Support kvcache in ROCm Support kvcache in AMD ROCm Sep 9, 2024
@rocking5566 rocking5566 changed the title Support kvcache in AMD ROCm Support page + kvcache in AMD ROCm Sep 9, 2024
@rocking5566 rocking5566 changed the title Support page + kvcache in AMD ROCm Support page kvcache in AMD ROCm Sep 9, 2024
@ehartford
Copy link

This change will greatly help us @tridao
Thank you!

@tridao tridao merged commit e2182cc into Dao-AILab:main Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants