fix crash issue on triton `paged_pa_mqa` by ganyi1996ppo · Pull Request #1853 · ROCm/aiter

ganyi1996ppo · 2026-01-16T05:51:25Z

Motivation

Since vllm's nightly docker still remains in triton 3.4.0, We might still use this kernel until new triton version available. This PR fix the overflow issue of deepgemm_fp8_paged_mqa_logits_stage1. And after the triton upgraded to 3.5.0, vllm will fully adopt gluon implementation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

This PR fixes an accuracy issue in the Triton paged_pa_mqa kernel by addressing an overflow problem in the deepgemm_fp8_paged_mqa_logits_stage1 function. The fix converts stride_out_heads to a 64-bit integer to prevent overflow in pointer arithmetic calculations.

Changes:

Fixed overflow issue in pointer arithmetic by converting stride_out_heads to tl.int64
Updated copyright year to 2026

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

valarLip · 2026-01-16T06:16:29Z

aiter/ops/triton/_triton_kernels/attention/pa_mqa_logits.py

stride_out_heads : tl.int64,

Signed-off-by: ganyi <ygan@amd.com>

* fix accuracy issue on triton paged_pa_mqa Signed-off-by: ganyi <ygan@amd.com> * add int64 annotation for input stride Signed-off-by: ganyi <ygan@amd.com> --------- Signed-off-by: ganyi <ygan@amd.com>

fix accuracy issue on triton paged_pa_mqa

01a0daa

Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo requested review from a team and Copilot January 16, 2026 05:51

Copilot AI reviewed Jan 16, 2026

View reviewed changes

ganyi1996ppo changed the title ~~fix accuracy issue on triton paged_pa_mqa~~ fix crash issue on triton paged_pa_mqa Jan 16, 2026

ganyi1996ppo mentioned this pull request Jan 16, 2026

[ROCm][Deepseekv3.2] Refactor Sparse Indexer as CustomOp vllm-project/vllm#29287

Merged

5 tasks

valarLip reviewed Jan 16, 2026

View reviewed changes

aiter/ops/triton/_triton_kernels/attention/pa_mqa_logits.py Outdated

Copy link

Collaborator

valarLip Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stride_out_heads : tl.int64,

add int64 annotation for input stride

b061aef

Signed-off-by: ganyi <ygan@amd.com>

valarLip approved these changes Jan 16, 2026

View reviewed changes

valarLip merged commit 371a22f into main Jan 19, 2026
16 of 19 checks passed

valarLip deleted the ganyi/fix_triton_version_paged_pa_mqa_logits branch January 19, 2026 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix crash issue on triton `paged_pa_mqa`#1853

fix crash issue on triton `paged_pa_mqa`#1853
valarLip merged 2 commits intomainfrom
ganyi/fix_triton_version_paged_pa_mqa_logits

ganyi1996ppo commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

valarLip Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ganyi1996ppo commented Jan 16, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

valarLip Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants