Skip to content

ggml : add ggml_fill() #13772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

ggml : add ggml_fill() #13772

wants to merge 3 commits into from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented May 25, 2025

Add ggml_fill(ctx0, tensor, value) which mimic the idea of pytorch full, full_like, zero_likes, ones_likes

It's not 100% equivalent to pytorch, as this is an in-place operation. However, it allow much more flexibility. For example:

  • Create a new tensor with constant value by new_tensor, then fill
  • Set part of an existing tensor to constant value by doing a view, then fill
  • Mimic the pytorch's *_like behavior by doing a dup, then fill

For simplification, this op is single-threaded, CPU-only for now

@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels May 25, 2025
@ngxson ngxson marked this pull request as ready for review May 25, 2025 09:43
@ngxson ngxson requested a review from ggerganov May 25, 2025 09:43
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
@ggerganov
Copy link
Member

Inplace operations are a bit tricky (#12757 (comment)), so I am a bit hesitating. Wondering if there is some other way to support this.

Create a new tensor with constant value by new_tensor, then fill

Such tensors should always be marked as inputs and set via the ggml_backend_tensor_set.

Set part of an existing tensor to constant value by doing a view, then fill

There isn't a convenient way to do it. Probably:

val = ggml_new_tensor(1 element);
ggml_set_input(val);

...

aux = ggml_repeat(val, [needed size])
cur = ggml_cpy(aux, ggml_view(...));
ggml_build_forward_expand(gf, cur);

@ngxson
Copy link
Collaborator Author

ngxson commented May 26, 2025

If we don't want to support inplace, we can internally create new tensor, so ggml_fill now become ggml_full_like

Setting it via an input can be a bit annoying especially in the case I want to use just one single number:

val = ggml_new_tensor(1 element);
ggml_set_input(val);
// then, alloc graph, set tensor data

Another way could be to provide a ggml_one which returns a tensor of one single element, having value 1.0f. So now we have ability to generate 0.0f and 1.0f, essentially a "linear base" that allow constructing any vectors possible 😂

(But ofc having something like pytorch's full_like will make my life a lot easier)

@ggerganov
Copy link
Member

Inplace operations are a bit tricky (#12757 (comment)), so I am a bit hesitating.

Thinking more about it, my concern might not be very relevant because in this case we don't use the existing data as input - i.e. we always override it with a specific value that does not depend on the input. So it's probably OK.

Maybe we should just improve the API a bit to become more type-safe. For example, what happens if you ggml_fill(x, 123.0f) when x is GGML_TYPE_I32? Probably we need overloads such as ggml_fill_f32(), ggml_fill_i32(), etc.

@ngxson
Copy link
Collaborator Author

ngxson commented May 26, 2025

Maybe we should just improve the API a bit to become more type-safe. For example, what happens if you ggml_fill(x, 123.0f) when x is GGML_TYPE_I32? Probably we need overloads such as ggml_fill_f32(), ggml_fill_i32(), etc.

Hmm I think it's currently the same concern as many other ops like ggml_scale, ggml_norm, etc.

But the idea of supporting I32 type is interesting. Don't know if asking this is too much, but since a long time now I really want ggml_cast to support converting back and forth between float and int. I still not yet figure out how to do that because (1) the code of ggml_cast is quite above my head, and (2) it can be tricky to reimplement on all backends. WDYT?

If ggml_cast can convert between float and int, then I think we can have a single ggml_fill accepting float, then cast it to I32 when needed (in general, I think this use case will be rare)

@slaren
Copy link
Member

slaren commented May 26, 2025

The implementation of ggml_fill_i32 and ggml_fill_f32 would be the same, just store an int32 instead of a float32 in the op_params. You only need one implementation for each type size, since it is only copying bits.

@slaren
Copy link
Member

slaren commented May 26, 2025

The issue of creating a new tensor in a graph with ggml_new_tensor is that ggml-alloc treats these as potential inputs and allocates all of them at the beginning of the compute buffer. This is so that the user can load data into it before evaluating the graph. If you are creating new tensors on a loop, e.g. per-layer, then it can quickly become a very large waste of memory.

Adding a version of this op that returns a new tensor instead of a view would be trivial, and would avoid this issue. Just make a ggml_fill_4d function or similar that creates a new tensor instead of a view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants