CUDA: fuse rope + set_rows #16884

am17an · 2025-10-31T05:20:51Z

Based on #16769.

On a 4090:

Model	Test	t/s master	t/s cuda-rope-fusion	Speedup
llama 8B Q4_K_M	tg32	134.90	136.07	1.01
llama 8B Q4_K_M	tg64	131.41	132.84	1.01
llama 8B Q4_K_M	tg128	130.54	131.87	1.01
qwen3moe 30B.A3B Q4_0	tg32	167.18	168.23	1.01
qwen3moe 30B.A3B Q4_0	tg64	161.00	161.90	1.01
qwen3moe 30B.A3B Q4_0	tg128	158.84	159.83	1.01

src/llama-graph.cpp

ggml/src/ggml-cuda/ggml-cuda.cu

ORippler

While the fusion itself is quite simple, I would still recommend to add a test to test-backend-ops for it nonetheless

ggml/src/ggml-cuda/rope.cu

am17an · 2025-10-31T13:44:04Z

While the fusion itself is quite simple, I would still recommend to add a test to test-backend-ops for it nonetheless

There is already a test added in the vulkan PR #16769

am17an requested review from CISC and slaren as code owners October 31, 2025 05:20

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 31, 2025

am17an commented Oct 31, 2025

View reviewed changes

src/llama-graph.cpp Outdated Show resolved Hide resolved

am17an requested a review from JohannesGaessler October 31, 2025 08:47

am17an commented Oct 31, 2025

View reviewed changes

ggml/src/ggml-cuda/ggml-cuda.cu Outdated Show resolved Hide resolved

am17an added 3 commits October 31, 2025 20:07

CUDA: add fused rope

ea859a2

move k forward_expand up

607f73b

create helper function instead of re-using params

dc814b8

am17an force-pushed the cuda-add-rope-fusion branch from 406c867 to dc814b8 Compare October 31, 2025 12:21

ORippler reviewed Oct 31, 2025

View reviewed changes

ggml/src/ggml-cuda/rope.cu Outdated Show resolved Hide resolved

DajanaV mentioned this pull request Oct 31, 2025

UPSTREAM PR #16884: CUDA: fuse rope + set_rows auroralabs-loci/llama.cpp#21

Open

make assert statement more in line with comment

89faa24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fuse rope + set_rows #16884

CUDA: fuse rope + set_rows #16884

am17an commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

ORippler left a comment

Uh oh!

Uh oh!

am17an commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CUDA: fuse rope + set_rows #16884

Are you sure you want to change the base?

CUDA: fuse rope + set_rows #16884

Conversation

am17an commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

ORippler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

am17an commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants