Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK_TILE] Tune fmha fwd splitkv codgen #1717

Draft
wants to merge 32 commits into
base: develop
Choose a base branch
from

Conversation

poyenc
Copy link
Contributor

@poyenc poyenc commented Dec 4, 2024

  1. Add instances to enable vector load on hdim_q/hdim_v
  2. Use larger tile size (kM0) for chunked prefill (group + paged-kvcache)
  3. Update num_splits heuristic (determine # workgroup base on the prefill/decode phase)

@poyenc
Copy link
Contributor Author

poyenc commented Dec 6, 2024

I've encountered some flash-attention test failure. I'm investigating the root cause.

@poyenc poyenc force-pushed the feature/add-splitkv-instance branch from be51207 to 488bfab Compare December 8, 2024 00:51
@poyenc
Copy link
Contributor Author

poyenc commented Dec 8, 2024

I cannot figure out why the FA tests fail after I added those new instances. Close this PR for now.

@poyenc poyenc closed this Dec 8, 2024
@poyenc poyenc reopened this Dec 16, 2024
@poyenc poyenc force-pushed the feature/add-splitkv-instance branch from 8429854 to ed634ea Compare December 17, 2024 08:56
@poyenc poyenc changed the title [CK_TILE] Add fmha fwd splitkv instances to enable vector load on hdim_q/hdim_v [CK_TILE] Tune fmha fwd splitkv codgen Dec 17, 2024
@poyenc
Copy link
Contributor Author

poyenc commented Dec 17, 2024

the FA test failure had been fixed

@poyenc poyenc marked this pull request as draft December 18, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant