sycl : Implemented reorder Q4_K mmvq #13109

sgeor255 · 2025-04-25T12:13:55Z

This PR enables reorder optimization for Q4_K layout similarly to #12858 . This branch is based off of @Alcpz 's and before that is merged the easiest way to review it is looking at the diff for 8cbe2c9 .

Some performance numbers on lunar lake below:

Q4_K reorder with GGML_SYCL_DISABLE_OPT=0

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	pp512	1586.19 ± 69.35
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	tg128	41.23 ± 0.43
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	pp512	550.65 ± 1.35
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	tg128	17.67 ± 1.05
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	pp512	616.41 ± 12.21
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	tg128	28.57 ± 0.32
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	508.14 ± 1.50
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	13.75 ± 0.12
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	pp512	827.73 ± 26.59
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	tg128	21.45 ± 0.17

build: 52b1622 (5099)

Q4_K reorder with GGML_SYCL_DISABLE_OPT=1

model	size	params	backend	ngl	threads	sm	test	t/s
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	pp512	1576.79 ± 80.93
qwen2 1.5B Q4_K - Medium	1.04 GiB	1.78 B	SYCL	99	8	none	tg128	36.27 ± 0.43
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	pp512	551.82 ± 1.63
llama 7B Q4_K - Medium	3.80 GiB	6.74 B	SYCL	99	8	none	tg128	12.24 ± 1.19
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	pp512	586.64 ± 1.65
gemma2 2B Q4_K - Medium	1.59 GiB	2.61 B	SYCL	99	8	none	tg128	24.04 ± 0.41
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	pp512	509.51 ± 0.87
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	SYCL	99	8	none	tg128	10.18 ± 0.04
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	pp512	825.29 ± 26.93
phi3 3B Q4_K - Medium	2.23 GiB	3.82 B	SYCL	99	8	none	tg128	17.83 ± 0.05

build: 52b1622 (5099)

TODO
- Performance on BMG and ARC

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

qnixsynapse · 2025-04-26T05:58:42Z

ggml/src/ggml-sycl/ggml-sycl.cpp

@@ -3636,22 +3664,65 @@ static void reorder_qw(char *data_device, const int ncols, const int nrows,
    sycl::free(tmp_buf, *stream);
 }

-static void reorder_qw(ggml_tensor * src0, dpct::queue_ptr stream) {
+static void reorder_qw_q4_k(char * data_device, size_t size, size_t offset, dpct::queue_ptr stream) {


Question: Is there a specific reason data_device is declared as a char* instead of a uint8_t*, especially considering it's later cast to uint8_t* as qs_ptr anyway?

Alcpz and others added 4 commits April 10, 2025 01:51

sycl : Implemented reorder Q4_0 mmvq

187451b

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

sycl : Fixed mmvq being called when reorder is disabled

9c8d809

sycl : Improved comments in the quants header

52b1622

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

sycl: reordered Q4_K MMVQ

8cbe2c9

Alcpz changed the title ~~sycl : Implemented reorder Q4_0 mmvq~~ sycl : Implemented reorder Q4_K mmvq Apr 25, 2025

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 25, 2025

qnixsynapse reviewed Apr 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycl : Implemented reorder Q4_K mmvq #13109

sycl : Implemented reorder Q4_K mmvq #13109

sgeor255 commented Apr 25, 2025

qnixsynapse Apr 26, 2025

sycl : Implemented reorder Q4_K mmvq #13109

Are you sure you want to change the base?

sycl : Implemented reorder Q4_K mmvq #13109

Conversation

sgeor255 commented Apr 25, 2025

qnixsynapse Apr 26, 2025

Choose a reason for hiding this comment