Skip to content

Conversation

@yzh119
Copy link
Collaborator

@yzh119 yzh119 commented Feb 27, 2025

  1. defer barrier sync for p_smem
  2. change unroll number from 1 to 2

We found there are still significant overhead for synchronizing two consumers in qk stage. Use only one warpgroup for qk can resolve the issue.

@yzh119 yzh119 merged commit 0ed1ce8 into main Feb 27, 2025
@zhyncs zhyncs deleted the slight-fix branch February 27, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants