[SYCL] Add oneDNN primitive support #9091

luoyu-intel · 2024-08-19T08:59:35Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

oneMKL and oneDNN almost share the same GPU kernels, but oneDNN has better compatibility.

llama-bench on A750:

DNNL F16
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |        pp1024 |   1607.21 ± 5.46 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |          tg32 |     32.01 ± 0.09 |

MKL F16
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |        pp1024 |   1581.72 ± 4.02 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |          tg32 |     32.08 ± 0.13 |

DNNL F32
| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |    710.00 ± 0.42 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |          tg32 |     35.77 ± 0.02 |

MKL F32
| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |    709.28 ± 0.92 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |          tg32 |     35.74 ± 0.04 |

on A770m, the master branch crashed with GGML_SYCL_F16=ON, this branch with oneDNN can run it.

DNNL F16

| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |   1598.31 ± 1.73 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |         tg128 |     40.20 ± 0.09 |

Ultra-155H, MKL throws this error:

Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)
Exception caught at file:C:/Users/luoyu/Documents/repo/llama-fork/ggml/src/ggml-sycl.cpp, line:2556, func:operator()
SYCL error: CHECK_TRY_ERROR(dpct::gemm( *stream, oneapi::mkl::transpose::trans, oneapi::mkl::transpose::nontrans, row_diff, src1_ncols, ne10, &alpha_f16, src0_ptr, dpct::library_data_t::real_half, ne00, src1_ptr, dpct::library_data_t::real_half, ne10, &beta_f16, dst_f16.get(), dpct::library_data_t::real_half, ldc, dpct::library_data_t::real_half)): Meet error in this line code!

DNNL F16
| model                          |       size |     params | backend    | ngl | threads |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |        pp1024 |   309.47 ± 39.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |          tg32 |     11.67 ± 1.07 |

ggml/src/CMakeLists.txt

NeoZhangJianyu

It's great to involve the oneDNN in SYCL backend.

If oneDNN is detected, oneMKL won't be used in whole code.
Is it right?
Please update the SYCL.md with oneMKL and oneDNN dependence info.

luoyu-intel · 2024-08-22T03:36:27Z

It's great to involve the oneDNN in SYCL backend.

If oneDNN is detected, oneMKL won't be used in whole code.
Is it right?

Please update the SYCL.md with oneMKL and oneDNN dependence info.

Yes.
MD has been updated.

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

airMeng requested review from ggerganov, joeatodd and airMeng August 19, 2024 09:04

airMeng reviewed Aug 19, 2024

View reviewed changes

ggml/src/CMakeLists.txt Show resolved Hide resolved

github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 19, 2024

airMeng approved these changes Aug 20, 2024

View reviewed changes

abhilash1910 approved these changes Aug 20, 2024

View reviewed changes

luoyu-intel added 8 commits August 21, 2024 15:48

add onednn

4cffe91

add sycl_f16

3d0a64f

add dnnl stream

4dc5515

add engine map

c751e65

fix

b830685

format

267af4e

use dnnl for intel only

af1b276

use fp16fp16fp16

2ee02b2

luoyu-intel force-pushed the onednn branch from a040eaa to 2ee02b2 Compare August 21, 2024 07:48

NeoZhangJianyu reviewed Aug 21, 2024

View reviewed changes

update doc

3f5eaea

github-actions bot added the documentation Improvements or additions to documentation label Aug 22, 2024

airMeng merged commit 1731d42 into ggerganov:master Aug 22, 2024
52 checks passed

qnixsynapse mentioned this pull request Aug 22, 2024

[SYCL] Add a space between the string and the variable to supress a cmake warning #9133

Merged

4 tasks

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024

[SYCL] Add oneDNN primitive support (ggerganov#9091)

b1fec1f

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

[SYCL] Add oneDNN primitive support (ggerganov#9091)

88ed956

* add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add oneDNN primitive support #9091

[SYCL] Add oneDNN primitive support #9091

luoyu-intel commented Aug 19, 2024 •

edited

Loading

NeoZhangJianyu left a comment

luoyu-intel commented Aug 22, 2024

[SYCL] Add oneDNN primitive support #9091

[SYCL] Add oneDNN primitive support #9091

Conversation

luoyu-intel commented Aug 19, 2024 • edited Loading

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

luoyu-intel commented Aug 22, 2024

luoyu-intel commented Aug 19, 2024 •

edited

Loading