Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast: byte offset / element count confusion #3307

Closed
shibe2 opened this issue Sep 22, 2023 · 4 comments · Fixed by #3447
Closed

CLBlast: byte offset / element count confusion #3307

shibe2 opened this issue Sep 22, 2023 · 4 comments · Fixed by #3447

Comments

@shibe2
Copy link
Contributor

shibe2 commented Sep 22, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

Correct uploading of contiguous 3D tensor data to GPU.

Current Behavior

ggml_cl_h2d_tensor_2d uses offset argument as byte offset in a call to clEnqueueWriteBuffer. ggml_cl_transform_tensor passes element count as offset to ggml_cl_h2d_tensor_2d. This corresponds to byte offset only if element size is exactly 1.

Also, I don't understand why ggml_cl_mul_f32 passes non-zero offset to ggml_cl_h2d_tensor_2d.

Environment and Context

AMD GPU
Linux

Steps to Reproduce

  1. Pass 3D tensor with contiguous GGML_TYPE_F16 or GGML_TYPE_F32 data to ggml_cl_transform_tensor.
  2. Read data back from GPU memory or perform ggml_cl_mul_mat on that tensor.
  3. Observe incorrect data or result.

Ping

@0cc4m
@JohannesGaessler
@SlyEcho

@SlyEcho
Copy link
Collaborator

SlyEcho commented Sep 22, 2023

ggml_cl_h2d_tensor_2d uses byte offsets because that's how the ggml tensor object points to the data. But maybe things have changed in the meantime?

@shibe2
Copy link
Contributor Author

shibe2 commented Sep 22, 2023

I think, it was was wrong from the beginning (since 2e6cd4b). With 2D tensors, offset is always 0, and the bug does not manifest itself. With 3D tensors, 2D slices partially overwrite previous slices.

@shibe2
Copy link
Contributor Author

shibe2 commented Oct 2, 2023

I fixed this specific bug by passing byte offset from ggml_cl_transform_tensor to ggml_cl_h2d_tensor_2d. I also fixed 2 related issues, so uploading works properly in all cases. Tested by reading data back from VRAM.

shibe2/llama.cpp@f58ebcb

However, this is not usable on its own, because that data is addressed incorrectly during computation. I don't know if this fix is worth merging separately. Currently I have only a partial fix for the computation part.

@SlyEcho
Copy link
Collaborator

SlyEcho commented Oct 2, 2023

You should make a draft PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants