Support profiler pa and reduce kernel and remove p99 to use test_common by ZhangLirong-amd · Pull Request #1787 · ROCm/aiter

ZhangLirong-amd · 2026-01-08T01:21:01Z

Motivation

To better breakdown Pa kernel time and reduce kernel time, we use profile to breakdown
python op_tests/test_pa_ps.py --profile

PERFORMANCE COMPARISON
========================================================================================================================
Sequence Lengths: 90002
------------------------------------------------------------------------------------------------------------------------
Metadata (get_pa_metadata_v1)            167.64us
------------------------------------------------------------------------------------------------------------------------
Method                                 Time (us)            PA Kernel               Reduce
------------------------------------------------------------------------------------------------------------------------
PA with Persistent Scheduling           1091.04us   1074.32us ( 98.5%)     15.87us (  1.5%)
PA without Persistent Scheduling        1303.82us   1303.78us (100.0%)      0.00us (  0.0%)
------------------------------------------------------------------------------------------------------------------------
Speedup                                       1.20x
========================================================================================================================

Copilot

Pull request overview

This PR refactors the performance profiling functionality in the PA (Paged Attention) test suite. The changes replace a percentile-based benchmarking approach with PyTorch profiler-based kernel time breakdown, allowing for more detailed analysis of PA kernel and reduce kernel execution times.

Key changes:

Replaces benchmark_with_percentile with profile_kernel_breakdown to provide kernel-level timing breakdown
Renames --use_p99 flag to --profile for clarity
Updates performance comparison output to show kernel-specific timing ratios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

op_tests/test_pa_ps.py

…st_common

Signed-off-by: Double Young <yang.yang2@amd.com>

dbyoung18

LGTM

dbyoung18

LGTM

…on (#1787) * Support profiler pa and reduce kernel and remove p99 method to use test_common * format code * add more info for pa * reformat * add bs info * use torch schedule to wram up * remove some feature in csv * refactor * test(paps): enhance paps ut Signed-off-by: Double Young <yang.yang2@amd.com> * fix(paps): fix format * fix(paps): fix format * fix(paps): fix format --------- Signed-off-by: Double Young <yang.yang2@amd.com> Co-authored-by: Double Young <yang.yang2@amd.com>

ZhangLirong-amd requested review from a team and Copilot January 8, 2026 01:21

Copilot AI reviewed Jan 8, 2026

View reviewed changes

op_tests/test_pa_ps.py Outdated Show resolved Hide resolved

op_tests/test_pa_ps.py Show resolved Hide resolved

op_tests/test_pa_ps.py Show resolved Hide resolved

ZhangLirong-amd requested a review from dbyoung18 January 8, 2026 01:22

ZhangLirong-amd force-pushed the zlr/pa_pa_fix branch from 9a58227 to 959de59 Compare January 8, 2026 05:00

dbyoung18 reviewed Jan 8, 2026

View reviewed changes

op_tests/test_pa_ps.py Outdated Show resolved Hide resolved

op_tests/test_pa_ps.py Outdated Show resolved Hide resolved

op_tests/test_pa_ps.py Show resolved Hide resolved

valarLip previously approved these changes Jan 10, 2026

View reviewed changes

dbyoung18 dismissed valarLip’s stale review via 23e9507 January 12, 2026 03:17

ZhangLirong-amd and others added 9 commits January 12, 2026 11:18

Support profiler pa and reduce kernel and remove p99 method to use te…

f2935b9

…st_common

format code

453c7a3

add more info for pa

d2d21a0

reformat

270fb77

add bs info

b052841

use torch schedule to wram up

2253e73

remove some feature in csv

5b84414

refactor

ddc4200

test(paps): enhance paps ut

495624d

Signed-off-by: Double Young <yang.yang2@amd.com>

dbyoung18 force-pushed the zlr/pa_pa_fix branch from 23e9507 to 495624d Compare January 12, 2026 03:18

dbyoung18 previously approved these changes Jan 12, 2026

View reviewed changes

fix(paps): fix format

11e0bc7

dbyoung18 dismissed their stale review via 11e0bc7 January 12, 2026 03:20

dbyoung18 added 2 commits January 12, 2026 11:21

fix(paps): fix format

ab3b8c3

fix(paps): fix format

3423970

dbyoung18 approved these changes Jan 12, 2026

View reviewed changes

valarLip approved these changes Jan 12, 2026

View reviewed changes

ZhangLirong-amd merged commit 19724aa into main Jan 12, 2026
17 checks passed

ZhangLirong-amd deleted the zlr/pa_pa_fix branch January 12, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support profiler pa and reduce kernel and remove p99 to use test_common#1787

Support profiler pa and reduce kernel and remove p99 to use test_common#1787
ZhangLirong-amd merged 12 commits intomainfrom
zlr/pa_pa_fix

ZhangLirong-amd commented Jan 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbyoung18 left a comment

Uh oh!

dbyoung18 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ZhangLirong-amd commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dbyoung18 left a comment

Choose a reason for hiding this comment

Uh oh!

dbyoung18 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZhangLirong-amd commented Jan 8, 2026 •

edited

Loading