Skip to content

Conversation

@gramalingam
Copy link
Collaborator

A couple of extensions to the GQA fusion pattern:

  • Support the case where there is no past key/value cache, and
  • Normalization and Transpose occur in the opposite order in Qwen (which has the same behavior). Support this pattern variation.

TODO: add test-cases to cover and validate this

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 68.42105% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.41%. Comparing base (45b5189) to head (a45c7e5).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
onnxscript/rewriter/ort_fusions/gqa.py 68.42% 2 Missing and 4 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2662   +/-   ##
=======================================
  Coverage   70.41%   70.41%           
=======================================
  Files         224      224           
  Lines       26600    26617   +17     
  Branches     2645     2647    +2     
=======================================
+ Hits        18730    18742   +12     
- Misses       6945     6948    +3     
- Partials      925      927    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
@justinchuby justinchuby enabled auto-merge (squash) October 29, 2025 17:49
@justinchuby justinchuby merged commit 647754f into main Oct 29, 2025
32 checks passed
@justinchuby justinchuby deleted the rama/qwen-gqa2 branch October 29, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants