Skip to content

sycl : Implemented reorder Q4_K mmvq #13109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sgeor255
Copy link
Contributor

This PR enables reorder optimization for Q4_K layout similarly to #12858 . This branch is based off of @Alcpz 's and before that is merged the easiest way to review it is looking at the diff for 8cbe2c9 .

Some performance numbers on lunar lake below:

  • Q4_K reorder with GGML_SYCL_DISABLE_OPT=0
model size params backend ngl threads sm test t/s
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 8 none pp512 1586.19 ± 69.35
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 8 none tg128 41.23 ± 0.43
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 8 none pp512 550.65 ± 1.35
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 8 none tg128 17.67 ± 1.05
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 8 none pp512 616.41 ± 12.21
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 8 none tg128 28.57 ± 0.32
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none pp512 508.14 ± 1.50
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none tg128 13.75 ± 0.12
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 8 none pp512 827.73 ± 26.59
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 8 none tg128 21.45 ± 0.17

build: 52b1622 (5099)

  • Q4_K reorder with GGML_SYCL_DISABLE_OPT=1
model size params backend ngl threads sm test t/s
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 8 none pp512 1576.79 ± 80.93
qwen2 1.5B Q4_K - Medium 1.04 GiB 1.78 B SYCL 99 8 none tg128 36.27 ± 0.43
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 8 none pp512 551.82 ± 1.63
llama 7B Q4_K - Medium 3.80 GiB 6.74 B SYCL 99 8 none tg128 12.24 ± 1.19
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 8 none pp512 586.64 ± 1.65
gemma2 2B Q4_K - Medium 1.59 GiB 2.61 B SYCL 99 8 none tg128 24.04 ± 0.41
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none pp512 509.51 ± 0.87
llama 8B Q4_K - Medium 4.58 GiB 8.03 B SYCL 99 8 none tg128 10.18 ± 0.04
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 8 none pp512 825.29 ± 26.93
phi3 3B Q4_K - Medium 2.23 GiB 3.82 B SYCL 99 8 none tg128 17.83 ± 0.05

build: 52b1622 (5099)

  • TODO
    • Performance on BMG and ARC

Alcpz and others added 4 commits April 10, 2025 01:51
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
@Alcpz Alcpz changed the title sycl : Implemented reorder Q4_0 mmvq sycl : Implemented reorder Q4_K mmvq Apr 25, 2025
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 25, 2025
@@ -3636,22 +3664,65 @@ static void reorder_qw(char *data_device, const int ncols, const int nrows,
sycl::free(tmp_buf, *stream);
}

static void reorder_qw(ggml_tensor * src0, dpct::queue_ptr stream) {
static void reorder_qw_q4_k(char * data_device, size_t size, size_t offset, dpct::queue_ptr stream) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is there a specific reason data_device is declared as a char* instead of a uint8_t*, especially considering it's later cast to uint8_t* as qs_ptr anyway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants