CLBlast: byte offset / element count confusion #3307

shibe2 · 2023-09-22T13:09:14Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[YES] I am running the latest code. bc9d3e3
[YES] I carefully followed the README.md.
[YES] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[YES] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Correct uploading of contiguous 3D tensor data to GPU.

Current Behavior

ggml_cl_h2d_tensor_2d uses offset argument as byte offset in a call to clEnqueueWriteBuffer. ggml_cl_transform_tensor passes element count as offset to ggml_cl_h2d_tensor_2d. This corresponds to byte offset only if element size is exactly 1.

Also, I don't understand why ggml_cl_mul_f32 passes non-zero offset to ggml_cl_h2d_tensor_2d.

Environment and Context

AMD GPU
Linux

Steps to Reproduce

Pass 3D tensor with contiguous GGML_TYPE_F16 or GGML_TYPE_F32 data to ggml_cl_transform_tensor.
Read data back from GPU memory or perform ggml_cl_mul_mat on that tensor.
Observe incorrect data or result.

Ping

@0cc4m
@JohannesGaessler
@SlyEcho

The text was updated successfully, but these errors were encountered:

SlyEcho · 2023-09-22T16:09:11Z

ggml_cl_h2d_tensor_2d uses byte offsets because that's how the ggml tensor object points to the data. But maybe things have changed in the meantime?

shibe2 · 2023-09-22T16:18:24Z

I think, it was was wrong from the beginning (since 2e6cd4b). With 2D tensors, offset is always 0, and the bug does not manifest itself. With 3D tensors, 2D slices partially overwrite previous slices.

shibe2 · 2023-10-02T11:06:55Z

I fixed this specific bug by passing byte offset from ggml_cl_transform_tensor to ggml_cl_h2d_tensor_2d. I also fixed 2 related issues, so uploading works properly in all cases. Tested by reading data back from VRAM.

shibe2/llama.cpp@f58ebcb

However, this is not usable on its own, because that data is addressed incorrectly during computation. I don't know if this fix is worth merging separately. Currently I have only a partial fix for the computation part.

SlyEcho · 2023-10-02T13:34:08Z

You should make a draft PR.

shibe2 mentioned this issue Sep 22, 2023

[User] GGML_ASSERT failure for opencl #3002

Closed

4 tasks

shibe2 mentioned this issue Sep 29, 2023

CLBlast: Support broadcasting for matrix multiplication and GQA #3402

Merged

shibe2 mentioned this issue Oct 2, 2023

CLBlast: Fix handling of on-device tensor data #3447

Merged

shibe2 closed this as completed in #3447 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLBlast: byte offset / element count confusion #3307

CLBlast: byte offset / element count confusion #3307

shibe2 commented Sep 22, 2023

SlyEcho commented Sep 22, 2023

shibe2 commented Sep 22, 2023

shibe2 commented Oct 2, 2023 •

edited

Loading

SlyEcho commented Oct 2, 2023

CLBlast: byte offset / element count confusion #3307

CLBlast: byte offset / element count confusion #3307

Comments

shibe2 commented Sep 22, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Steps to Reproduce

Ping

SlyEcho commented Sep 22, 2023

shibe2 commented Sep 22, 2023

shibe2 commented Oct 2, 2023 • edited Loading

SlyEcho commented Oct 2, 2023

shibe2 commented Oct 2, 2023 •

edited

Loading