Skip to content

Commit 264f1b5

Browse files
authored
zdnn: refactor codebase + add docs (#16178)
* zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
1 parent 0bc7cc7 commit 264f1b5

File tree

11 files changed

+334
-266
lines changed

11 files changed

+334
-266
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
274274
| [Vulkan](docs/build.md#vulkan) | GPU |
275275
| [CANN](docs/build.md#cann) | Ascend NPU |
276276
| [OpenCL](docs/backend/OPENCL.md) | Adreno GPU |
277+
| [IBM zDNN](docs/backend/zDNN.md) | IBM Z & LinuxONE |
277278
| [WebGPU [In Progress]](docs/build.md#webgpu) | All |
278279
| [RPC](https://github.com/ggml-org/llama.cpp/tree/master/tools/rpc) | All |
279280

docs/backend/zDNN.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# llama.cpp for IBM zDNN Accelerator
2+
3+
## Background
4+
5+
IBM zDNN (Z Deep Neural Network) is a hardware acceleration library designed specifically to leverage the IBM NNPA (Neural Network Processor Assist) accelerator located within IBM Telum I and II processors. It provides significant performance improvements for neural network inference operations.
6+
7+
### Llama.cpp + IBM zDNN
8+
9+
The llama.cpp zDNN backend is designed to enable llama.cpp on IBM z17 and later systems via the IBM zDNN hardware acceleration library.
10+
11+
## Software & Hardware Support
12+
13+
| Hardware Level | Status | Verified |
14+
| -------------------- | ------------- | -------------------------- |
15+
| IBM z17 / LinuxONE 5 | Supported | RHEL 9.6, IBM z17, 40 IFLs |
16+
| IBM z16 / LinuxONE 4 | Not Supported | |
17+
18+
## Data Types Supported
19+
20+
| Data Type | Status |
21+
| --------- | --------- |
22+
| F32 | Supported |
23+
| F16 | Supported |
24+
| BF16 | Supported |
25+
26+
## CMake Options
27+
28+
The IBM zDNN backend has the following CMake options that control the behaviour of the backend.
29+
30+
| CMake Option | Default Value | Description |
31+
| ------------ | ------------- | ----------------------------------- |
32+
| `GGML_ZDNN` | `OFF` | Compile llama.cpp with zDNN support |
33+
| `ZDNN_ROOT` | `""` | Override zDNN library lookup |
34+
35+
## 1. Install zDNN Library
36+
37+
Note: Using the zDNN library provided via `apt` or `yum` may not work correctly as reported in [#15772](https://github.com/ggml-org/llama.cpp/issues/15772). It is preferred that you compile from source.
38+
39+
```sh
40+
git clone --recurse-submodules https://github.com/IBM/zDNN
41+
cd zDNN
42+
43+
autoreconf .
44+
./configure --prefix=/opt/zdnn-libs
45+
46+
make build
47+
sudo make install
48+
```
49+
50+
## 2. Build llama.cpp
51+
52+
```sh
53+
git clone https://github.com/ggml-org/llama.cpp
54+
cd llama.cpp
55+
56+
cmake -S . -G Ninja -B build \
57+
-DCMAKE_BUILD_TYPE=Release \
58+
-DGGML_ZDNN=ON \
59+
-DZDNN_ROOT=/opt/zdnn-libs
60+
cmake --build build --config Release -j$(nproc)
61+
```

ggml/include/ggml-zdnn.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@
77
extern "C" {
88
#endif
99

10+
// device buffer
11+
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_zdnn_buffer_type(void);
12+
1013
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_zdnn_reg(void);
1114

1215
#ifdef __cplusplus

ggml/src/ggml-zdnn/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
zdnn.h

ggml/src/ggml-zdnn/common.hpp

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#ifndef GGML_ZDNN_COMMON_HPP
2+
#define GGML_ZDNN_COMMON_HPP
3+
4+
#include "ggml.h"
5+
#include "ggml-impl.h"
6+
7+
#include "zdnn.h"
8+
9+
#include <vector>
10+
#include <memory>
11+
12+
#define GGML_ZDNN_NAME "zDNN"
13+
#define GGML_ZDNN_VERSION ZDNN_VERNUM
14+
15+
#define ZDNN_CHECK(stmt) \
16+
do { \
17+
zdnn_status status = (stmt); \
18+
GGML_ASSERT(status == ZDNN_OK); \
19+
} while (0);
20+
21+
struct ggml_backend_zdnn_device_context {
22+
int zdnn_device;
23+
int zdnn_device_ref_count;
24+
25+
bool has_parmblkformat_0;
26+
bool has_parmblkformat_1; // checks for z17
27+
28+
size_t max_size;
29+
30+
char name[128];
31+
};
32+
33+
struct ggml_backend_zdnn_context {
34+
int device;
35+
ggml_cgraph * gf;
36+
};
37+
38+
struct ggml_backend_zdnn_buffer {
39+
void * data;
40+
ggml_backend_zdnn_buffer * extra; // for bias, etc.
41+
size_t size;
42+
43+
zdnn_tensor_desc pre_tfm_desc;
44+
zdnn_tensor_desc tfm_desc;
45+
zdnn_ztensor ztensor;
46+
47+
char name[GGML_MAX_NAME];
48+
};
49+
50+
struct ggml_backend_zdnn_buffer_context {
51+
void * all_data;
52+
size_t all_size;
53+
bool owned;
54+
55+
int n_buffers;
56+
std::vector<std::unique_ptr<ggml_backend_zdnn_buffer>> buffers;
57+
};
58+
59+
#endif // GGML_ZDNN_COMMON_HPP

ggml/src/ggml-zdnn/ggml-zdnn-impl.h

Lines changed: 0 additions & 98 deletions
This file was deleted.

0 commit comments

Comments
 (0)