-
Notifications
You must be signed in to change notification settings - Fork 13.3k
ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl #15928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ggml/src/ggml-quants.c
Outdated
} | ||
float best = 0; | ||
float scale = max/(2*kMaxQ-1); | ||
for (int k = 0; k < 8; ++k) is_on_grid[k] = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the goal is only to silence a compiler warning, then I would just zero-initialize the variable where it is declared. If you suspect that this is actually a bug that is leading to wrong results, then I think it needs more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixing an actual bug, depending on the condition. It will potentially use uninitialized (or initialized from previous loop) data depending on this being done or not:
llama.cpp/ggml/src/ggml-quants.c
Lines 3750 to 3754 in f50c60d
if (sumq2 > 0 && sumqx*sumqx > best*sumq2) { | |
scale = sumqx/sumq2; best = scale*sumqx; | |
for (int i = 0; i < 32; ++i) L[i] = Laux[i]; | |
for (int k = 0; k < 8; ++k) is_on_grid[k] = is_on_grid_aux[k]; | |
} |
See the other quants for reference:
llama.cpp/ggml/src/ggml-quants.c
Line 3283 in f50c60d
is_on_grid[0] = is_on_grid[1] = true; |
llama.cpp/ggml/src/ggml-quants.c
Line 3931 in f50c60d
for (int k = 0; k < bs4; ++k) is_on_grid[k] = false; |
llama.cpp/ggml/src/ggml-quants.c
Line 4883 in f50c60d
is_on_grid[0] = is_on_grid[1] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it is not obvious that this will change the results, or if it does, that it needs to be initialized to false instead of true (or other values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed not obvious, it should perhaps be set to true
, or maybe even do as quantize_row_iq3_s_impl
where it is set to false
:
llama.cpp/ggml/src/ggml-quants.c
Line 3968 in f50c60d
//if (is_on_grid[k]) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way it is potentially using uninitialized data right now. The safest choice AFAICT is initializing it to false
true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, what is it fixing? Is the change meaningful, or is it just adding more noise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixing uninitialized data, nothing more or less, exactly like in all the other quants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..and yes, this will only happen when the weights are zero, but this has happened enough times that we have added several checks against it:
llama.cpp/ggml/src/ggml-quants.c
Line 493 in f50c60d
float scale = suml2 ? sumlx/suml2 : 0.0f; |
llama.cpp/ggml/src/ggml-quants.c
Line 569 in f50c60d
return suml2 > 0.0f ? sumlx / suml2 : 0.0f; |
llama.cpp/ggml/src/ggml-quants.c
Line 969 in f50c60d
return suml2 > 0.0f ? sumlx / suml2 : 0.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that if all weights are zero, scale
will also be zero, and the branch that uses is_on_grid
will be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that if all weights are zero,
scale
will also be zero, and the branch that usesis_on_grid
will be ignored.
Not necessarily, only if the whole block is zero, however that may not be the case, also scale
is based off the original weights, while the weights in question are the ones after imatrix is applied (ie, the imatrix may cause non-zero parts to go to zero).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L
with all zeros is on grid for IQ3_XXS
, so initializing is_on_grid
to true
makes sense.
(EDIT: but L
doesn't seem to be initialized either...)
If the original weights are all zero (or close) |
Hi! Any update on this? It's one of the items preventing #15925 from passing CI tests due to |
I will double check |
Since imatrix weights are unlikely to be just partially zero it means the whole block will be on grid in the event of weights being zeroed by imatrix, and then |
* origin/master: (39 commits) ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200) ci : enable Vulkan workflow on Mac (ggml-org#16194) ggml-cpu: Respect cpumask settings (ggml-org#16164) ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928) zdnn: refactor codebase + add docs (ggml-org#16178) codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190) devops: add s390x containers (ggml-org#15915) ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189) feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177) clang-tidy : disable warning about performance enum size (ggml-org#16127) ggml : implement set_rows with i32 index (ggml-org#16159) codeowners : update + cleanup (ggml-org#16174) common : enable `--offline` mode without curl support (ggml-org#16137) webui : fix handling incomplete chunks (ggml-org#16107) embedding : fix typos in README (ggml-org#16171) common : remove unused local variables (ggml-org#16140) ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123) ggml : add ggml_op_is_empty (ggml-org#16122) codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128) Vulkan: add conv_transpose_2d operation (ggml-org#16022) ...
…l-org#15928) * fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl * change initialization to true
See https://github.com/ggml-org/llama.cpp/actions/runs/17615331967/job/50046823487#step:5:87