Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add oneDNN primitive support #9091

Merged
merged 9 commits into from
Aug 22, 2024
Merged

Conversation

luoyu-intel
Copy link
Contributor

@luoyu-intel luoyu-intel commented Aug 19, 2024

oneMKL and oneDNN almost share the same GPU kernels, but oneDNN has better compatibility.

llama-bench on A750:

DNNL F16
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |        pp1024 |   1607.21 ± 5.46 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |          tg32 |     32.01 ± 0.09 |

MKL F16
| model                          |       size |     params | backend    | ngl |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |        pp1024 |   1581.72 ± 4.02 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |          tg32 |     32.08 ± 0.13 |

DNNL F32
| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |    710.00 ± 0.42 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |          tg32 |     35.77 ± 0.02 |

MKL F32
| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |    709.28 ± 0.92 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |          tg32 |     35.74 ± 0.04 |

on A770m, the master branch crashed with GGML_SYCL_F16=ON, this branch with oneDNN can run it.

DNNL F16

| model                          |       size |     params | backend    | ngl | threads |    sm |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ----: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |        pp1024 |   1598.31 ± 1.73 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |  none |         tg128 |     40.20 ± 0.09 |

Ultra-155H, MKL throws this error:

Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)
Exception caught at file:C:/Users/luoyu/Documents/repo/llama-fork/ggml/src/ggml-sycl.cpp, line:2556, func:operator()
SYCL error: CHECK_TRY_ERROR(dpct::gemm( *stream, oneapi::mkl::transpose::trans, oneapi::mkl::transpose::nontrans, row_diff, src1_ncols, ne10, &alpha_f16, src0_ptr, dpct::library_data_t::real_half, ne00, src1_ptr, dpct::library_data_t::real_half, ne10, &beta_f16, dst_f16.get(), dpct::library_data_t::real_half, ldc, dpct::library_data_t::real_half)): Meet error in this line code!
DNNL F16
| model                          |       size |     params | backend    | ngl | threads |          test |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | ------------: | ---------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |        pp1024 |   309.47 ± 39.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  33 |       8 |          tg32 |     11.67 ± 1.07 |

@github-actions github-actions bot added build Compilation issues ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Aug 19, 2024
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to involve the oneDNN in SYCL backend.

  1. If oneDNN is detected, oneMKL won't be used in whole code.
    Is it right?

  2. Please update the SYCL.md with oneMKL and oneDNN dependence info.

@luoyu-intel
Copy link
Contributor Author

It's great to involve the oneDNN in SYCL backend.

  1. If oneDNN is detected, oneMKL won't be used in whole code.
    Is it right?
  2. Please update the SYCL.md with oneMKL and oneDNN dependence info.
  1. Yes.
  2. MD has been updated.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 22, 2024
@airMeng airMeng merged commit 1731d42 into ggerganov:master Aug 22, 2024
52 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants