struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

cmp-nct · 2023-04-20T21:37:19Z

Given that the tensor struct uses padding it's not nice to add any more information into it.
It currently has a static 8 byte padding at the end, that's perfect to be replaced by a pointer to a struct to store additional information.
For example a optional human readable tensor name (to be used in graph print) or a couple u_int8 to switch on or off features by tensor.
For example: use_cublas=0
This would allow to fine-control the usage of such a library instead of hard-coding it on through a define flag.
For example: performance_type=HIGH/LOW/MID
For example: threads_override=2

use_cublas could be initialized depending on the define as 1/0.

I'd also move the task scheduling n_task out and the performance stuff
The overhead to access a compact external struct should be zero.

I suppose all that stuff could also be added directly into the tensor struct but if we have to keep it aligned that's not nice.
Especially given 64/32 bit environments with different pointer sizes etc.

Just as an example, giving tensors a name can look like that in the debug output:

 - 842: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   0.000 /   0.000 ms, wall =   0.595 /   0.595 ms [wo x cur 22]
 - 843: [    4096,       1,       1]              ADD   (  1) cpu =   0.000 /   0.000 ms, wall =   0.008 /   0.008 ms [noname]
 - 844: [    4096,       1,       1]         RMS_NORM   (  1) cpu =   0.000 /   0.000 ms, wall =   0.015 /   0.015 ms [noname]
 - 845: [    4096,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.004 /   0.004 ms [noname]
 - 846: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.584 /   1.584 ms [w1 x cur 22] (Hot)
 - 847: [   11008,       1,       1]             SILU   (  1) cpu =   0.000 /   0.000 ms, wall =   0.219 /   0.219 ms [noname]
 - 848: [   11008,       1,       1]          MUL_MAT   (  1) cpu =   2.000 /   2.000 ms, wall =   1.643 /   1.643 ms [w3 x cur 22] (Hot)
 - 849: [   11008,       1,       1]              MUL   (  1) cpu =   0.000 /   0.000 ms, wall =   0.009 /   0.009 ms [noname]
 - 850: [    4096,       1,       1]          MUL_MAT   (  1) cpu =   3.000 /   3.000 ms, wall =   3.137 /   3.137 ms [w2 x cur 22] (Hot)

Without names it's unreadable, the image version looks ok but in text form it's just too many operations to keep track.

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-04-21T18:46:48Z

Yes, good idea. I think it's better to just extend ggml_tensor instead of another level of indirection.
But anyway, this extension will happen soon as I also see various use cases

cmp-nct · 2023-04-21T20:42:11Z

Is the padding still relevant ? Can I just extend it without causing problems ?

Maybe a couple structs (non pointers) in that case, separating the tensorstruct into a few logical groups like performance, debug, scheduler would make it a bit more accessible while still keeping all together in one parent.

I am a bit confused on padding in general because in 32 bit pointer size is 4 bytes, so I am not sure if the static 8 byte padding is right for all hardware.

Currently I got that one in use. The name is set by an optional function call:

typedef struct
{
    struct ggml_tensor *tensor; // points back to it's tensor
    char *name;                 // a optional name

    uint8_t flag_high_performance; // if this tensor is high performance (relevant for hybrid core systems like Intel 12/13 gen)
    uint8_t flag_use_blas_cuda;    // defines if CUDA is enabled for this tensor (destination tensor)
} tensor_meta;

ggerganov · 2023-04-22T10:58:16Z

Is the padding still relevant ? Can I just extend it without causing problems ?

Padding is just to make sure these asserts hold:

https://github.com/ggerganov/llama.cpp/blob/5f939498d517b4dddbe904f202e895a3ecfb9dc4/ggml.c#L3403-L3404

Maybe a couple structs (non pointers) in that case

I think we want extra members without wrapping them in extra structures. Just streamline it as much as possible.
Also, no char * pointers. Names should have a predetermined max length (e.g. 16 or 32) so that memory size per tensor is fixed

sw · 2023-04-28T13:24:16Z

The static_assert(sizeof(... seems a bit dubious, it should probably be static_assert(_Alignof(... (which currently fails). If there was a clean and portable way to specify struct alignment, we could apply that to these two structs and remove the explicit padding bytes.

ggerganov · 2023-07-28T19:56:55Z

Decided to add void * extra; after all

ggerganov added enhancement New feature or request high priority Very important issue labels Apr 21, 2023

slaren mentioned this issue Apr 22, 2023

Performance issues with cuBLAS and a bug #1129

Closed

ggerganov closed this as completed Jul 28, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

cmp-nct commented Apr 20, 2023 •

edited

Loading

ggerganov commented Apr 21, 2023

cmp-nct commented Apr 21, 2023 •

edited

Loading

ggerganov commented Apr 22, 2023

sw commented Apr 28, 2023

ggerganov commented Jul 28, 2023

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

struct ggml_tensor -> add a struct meta pointer into it (can replace padding) #1093

Comments

cmp-nct commented Apr 20, 2023 • edited Loading

ggerganov commented Apr 21, 2023

cmp-nct commented Apr 21, 2023 • edited Loading

ggerganov commented Apr 22, 2023

sw commented Apr 28, 2023

ggerganov commented Jul 28, 2023

cmp-nct commented Apr 20, 2023 •

edited

Loading

cmp-nct commented Apr 21, 2023 •

edited

Loading