Extend GQA fusion for Qwen #2662

gramalingam · 2025-10-29T06:02:42Z

A couple of extensions to the GQA fusion pattern:

Support the case where there is no past key/value cache, and
Normalization and Transpose occur in the opposite order in Qwen (which has the same behavior). Support this pattern variation.

TODO: add test-cases to cover and validate this

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

onnxscript/rewriter/ort_fusions/gqa.py

codecov · 2025-10-29T06:10:11Z

Codecov Report

❌ Patch coverage is 68.42105% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.41%. Comparing base (45b5189) to head (a45c7e5).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
onnxscript/rewriter/ort_fusions/gqa.py	68.42%	2 Missing and 4 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2662   +/-   ##
=======================================
  Coverage   70.41%   70.41%           
=======================================
  Files         224      224           
  Lines       26600    26617   +17     
  Branches     2645     2647    +2     
=======================================
+ Hits        18730    18742   +12     
- Misses       6945     6948    +3     
- Partials      925      927    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

gramalingam added 3 commits October 28, 2025 21:46

Extend GQA fusion for Qwen

7ee86ce

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

Add support for GQA without past

a16e700

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

Merge branch 'rama/qwen-gqa' into rama/qwen-gqa2

126c9fd

github-project-automation bot moved this to Todo in ONNX Script Review Board Oct 29, 2025

github-project-automation bot added this to ONNX Script Review Board Oct 29, 2025

github-advanced-security bot found potential problems Oct 29, 2025

View reviewed changes

onnxscript/rewriter/ort_fusions/gqa.py Fixed Show fixed Hide fixed

onnxscript/rewriter/ort_fusions/gqa.py Fixed Show fixed Hide fixed

justinchuby approved these changes Oct 29, 2025

View reviewed changes

github-project-automation bot moved this from Todo to Done in ONNX Script Review Board Oct 29, 2025

justinchuby added the module: rewriter label Oct 29, 2025

Run lint

a45c7e5

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>

justinchuby approved these changes Oct 29, 2025

View reviewed changes

justinchuby enabled auto-merge (squash) October 29, 2025 17:49

justinchuby merged commit 647754f into main Oct 29, 2025
32 checks passed

justinchuby deleted the rama/qwen-gqa2 branch October 29, 2025 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend GQA fusion for Qwen #2662

Extend GQA fusion for Qwen #2662

Uh oh!

gramalingam commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Extend GQA fusion for Qwen #2662

Extend GQA fusion for Qwen #2662

Uh oh!

Conversation

gramalingam commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 29, 2025 •

edited

Loading