Skip to content

Comments

Fused rope_kv and bmm#1847

Merged
omuhamma merged 22 commits intomainfrom
omuhamma/fused_rope_kv_bmm
Jan 16, 2026
Merged

Fused rope_kv and bmm#1847
omuhamma merged 22 commits intomainfrom
omuhamma/fused_rope_kv_bmm

Conversation

@omuhamma
Copy link
Contributor

@omuhamma omuhamma commented Jan 14, 2026

Test Result

This branch is called from ATOM in this pr: ROCm/ATOM#138

  • Traces for both fp4 and fp8 were checked to verify the correct kernels
  • Accuracy was checked to be about 93-94% on both fp4 and fp8 which is good
  • Performance has received about a 0-2% boost depending on the concurrency

AITER
omuhamma/fused_rope_kv_bmm

ATOM
baseline branch
fix_moe_integration_tmp

perf branch
omuhamma/fused_rope_kv_bmm

server command FP4

image

server command FP8

image

client command

image

FP4 Performance

image

Fp8 Performance

image

@omuhamma omuhamma self-assigned this Jan 14, 2026
omuhamma and others added 5 commits January 14, 2026 16:32
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@omuhamma omuhamma marked this pull request as ready for review January 16, 2026 03:28
@omuhamma omuhamma requested a review from a team January 16, 2026 03:28
Copy link
Contributor

@azaidy azaidy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@azaidy azaidy requested a review from vgokhale January 16, 2026 23:01
@omuhamma omuhamma merged commit 9eea3df into main Jan 16, 2026
23 of 25 checks passed
@omuhamma omuhamma deleted the omuhamma/fused_rope_kv_bmm branch January 16, 2026 23:59
yzhou103 pushed a commit that referenced this pull request Jan 28, 2026
* Fused rope_kv and bmm

* Apply suggestion from @github-actions[bot]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Apply suggestion from @github-actions[bot]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update fused_bmm_rope_kv_cache.py

* Update fused_bmm_rope_kv_cache.py

* add test

* update

* update

* parse bmm config

* fp8 API and kernel change

* fp8 UT

* Apply suggestion from @github-actions[bot]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Apply suggestion from @github-actions[bot]

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Formatting with black

* pytest skip if fp4/8 is not avail on device

* code format with black

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ShaoChunLee <Shao-Chun.Lee@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants