Skip to content

sycl: quantize and reorder the input to q8_1 when reorder is enabled #13826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

AD2605
Copy link
Contributor

@AD2605 AD2605 commented May 27, 2025

Description

This PR adds a kernel which quantizes and reorders the input when converting the src1 tensor to the type q8_1, when the reorder optimization is enabled.

All the test cases pass when running with the environment variable GGML_SYCL_DISABLE_OPT to 0.

Performance Data. All performance data has been gathered with the above environment variable set to 0, with 2025.1 toolkit, and with the parameters of llama-bench set to -ngl 99 -t 8 -r 10.

Battlemage

model size params backend ngl test t/s master(f9cd683) t/s (This Branch)
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 7454.58 ± 27.22 7410.26 ± 24.63
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 137.00 ± 2.06 135.75 ± 1.88
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 pp512 5713.46 ± 16.98 5690.11 ± 14.11
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 tg128 89.00 ± 1.50 88.76 ± 1.57
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 pp512 3171.06 ± 5.99 3163.50 ± 5.43
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 tg128 69.67 ± 0.51 69.88 ± 0.72
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 pp512 2069.41 ± 2.58 2066.22 ± 2.69
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 tg128 47.36 ± 0.31 47.25 ± 0.26

Lunar Lake

model size params backend ngl test t/s (f9cd683) t/s(This branch)
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 pp512 1802.52 ± 75.95 1887.25 ± 11.63
qwen2 1.5B Q4_0 1013.62 MiB 1.78 B SYCL 99 tg128 55.43 ± 0.17 57.56 ± 1.20
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 pp512 1095.61 ± 28.74 1386.69 ± 15.77
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 tg128 28.35 ± 0.11 29.42 ± 0.37
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 pp512 736.88 ± 2.24 710.65 ± 17.88
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 tg128 22.36 ± 0.27 24.75 ± 0.43
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 pp512 422.54 ± 15.98 447.95 ± 2.62
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 tg128 12.99 ± 0.05 14.11 ± 0.12

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels May 27, 2025
Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. This is great work! Thanks

AD2605 and others added 2 commits May 29, 2025 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants