Skip to content

Conversation

@am17an
Copy link
Collaborator

@am17an am17an commented Oct 31, 2025

Based on #16769.

On a 4090:

Model Test t/s master t/s cuda-rope-fusion Speedup
llama 8B Q4_K_M tg32 134.90 136.07 1.01
llama 8B Q4_K_M tg64 131.41 132.84 1.01
llama 8B Q4_K_M tg128 130.54 131.87 1.01
qwen3moe 30B.A3B Q4_0 tg32 167.18 168.23 1.01
qwen3moe 30B.A3B Q4_0 tg64 161.00 161.90 1.01
qwen3moe 30B.A3B Q4_0 tg128 158.84 159.83 1.01

@am17an am17an requested review from CISC and slaren as code owners October 31, 2025 05:20
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 31, 2025
@am17an am17an force-pushed the cuda-add-rope-fusion branch from 406c867 to dc814b8 Compare October 31, 2025 12:21
Copy link
Contributor

@ORippler ORippler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the fusion itself is quite simple, I would still recommend to add a test to test-backend-ops for it nonetheless

@am17an
Copy link
Collaborator Author

am17an commented Oct 31, 2025

While the fusion itself is quite simple, I would still recommend to add a test to test-backend-ops for it nonetheless

There is already a test added in the vulkan PR #16769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants