Skip to content

Conversation

@yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Feb 25, 2025

As observed in #892 , we found flashinfer mla's second stage of split-k is very slow (when batch size is small), this is because our scheduler only uses one CTA for the second stage of split-k.

This PR fixes the issue.

@yzh119 yzh119 merged commit 1e330b7 into main Feb 26, 2025
@zhyncs zhyncs deleted the fix-split-k-performance-bug branch February 27, 2025 16:38
MasterJH5574 added a commit to MasterJH5574/flashinfer that referenced this pull request Mar 13, 2025
This PR applies changes in flashinfer-ai#898 and flashinfer-ai#900 to the MLA TVM binding.
yzh119 pushed a commit that referenced this pull request Mar 13, 2025
This PR applies changes in #898 and #900 to the MLA TVM binding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants