Skip to content

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Sep 10, 2025

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 10, 2025
@CISC CISC mentioned this pull request Sep 10, 2025
4 tasks
@CISC CISC requested a review from slaren September 10, 2025 18:37
}
float best = 0;
float scale = max/(2*kMaxQ-1);
for (int k = 0; k < 8; ++k) is_on_grid[k] = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is only to silence a compiler warning, then I would just zero-initialize the variable where it is declared. If you suspect that this is actually a bug that is leading to wrong results, then I think it needs more explanation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixing an actual bug, depending on the condition. It will potentially use uninitialized (or initialized from previous loop) data depending on this being done or not:

if (sumq2 > 0 && sumqx*sumqx > best*sumq2) {
scale = sumqx/sumq2; best = scale*sumqx;
for (int i = 0; i < 32; ++i) L[i] = Laux[i];
for (int k = 0; k < 8; ++k) is_on_grid[k] = is_on_grid_aux[k];
}

See the other quants for reference:

is_on_grid[0] = is_on_grid[1] = true;

for (int k = 0; k < bs4; ++k) is_on_grid[k] = false;

is_on_grid[0] = is_on_grid[1] = true;

Copy link
Member

@slaren slaren Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it is not obvious that this will change the results, or if it does, that it needs to be initialized to false instead of true (or other values).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed not obvious, it should perhaps be set to true, or maybe even do as quantize_row_iq3_s_impl where it is set to false:

//if (is_on_grid[k]) continue;

Copy link
Collaborator Author

@CISC CISC Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way it is potentially using uninitialized data right now. The safest choice AFAICT is initializing it to falsetrue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what is it fixing? Is the change meaningful, or is it just adding more noise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fixing uninitialized data, nothing more or less, exactly like in all the other quants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..and yes, this will only happen when the weights are zero, but this has happened enough times that we have added several checks against it:

float scale = suml2 ? sumlx/suml2 : 0.0f;

return suml2 > 0.0f ? sumlx / suml2 : 0.0f;

return suml2 > 0.0f ? sumlx / suml2 : 0.0f;

Copy link
Member

@slaren slaren Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if all weights are zero, scale will also be zero, and the branch that uses is_on_grid will be ignored.

Copy link
Collaborator Author

@CISC CISC Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if all weights are zero, scale will also be zero, and the branch that uses is_on_grid will be ignored.

Not necessarily, only if the whole block is zero, however that may not be the case, also scale is based off the original weights, while the weights in question are the ones after imatrix is applied (ie, the imatrix may cause non-zero parts to go to zero).

Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L with all zeros is on grid for IQ3_XXS, so initializing is_on_grid to true makes sense.

(EDIT: but L doesn't seem to be initialized either...)

@CISC
Copy link
Collaborator Author

CISC commented Sep 13, 2025

(EDIT: but L doesn't seem to be initialized either...)

If the original weights are all zero (or close) L will get cleared, however not if the imatrixed weights are zero. :(

@taronaeo
Copy link
Collaborator

Hi! Any update on this? It's one of the items preventing #15925 from passing CI tests due to LLAMA_FATAL_WARNINGS=ON

@CISC
Copy link
Collaborator Author

CISC commented Sep 22, 2025

Hi! Any update on this? It's one of the items preventing #15925 from passing CI tests due to LLAMA_FATAL_WARNINGS=ON

I will double check L for the other quants first later today.

@CISC
Copy link
Collaborator Author

CISC commented Sep 23, 2025

Since imatrix weights are unlikely to be just partially zero it means the whole block will be on grid in the event of weights being zeroed by imatrix, and then L will not be used.

@CISC CISC merged commit f6b4af3 into master Sep 23, 2025
47 of 48 checks passed
@CISC CISC deleted the cisc/iq3-xxs-uninitialized-is-on-grid branch September 23, 2025 08:25
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 23, 2025
* origin/master: (39 commits)
ci : disable AMD workflows + update NVIDIA workflows (ggml-org#16200)
ci : enable Vulkan workflow on Mac (ggml-org#16194)
ggml-cpu: Respect cpumask settings (ggml-org#16164)
ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (ggml-org#15928)
zdnn: refactor codebase + add docs (ggml-org#16178)
codeowners : add @danbev to model-conversion example [no ci] (ggml-org#16190)
devops: add s390x containers (ggml-org#15915)
ggml-cpu : fix typo in gemm comments [no ci] (ggml-org#16189)
feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) (ggml-org#16177)
clang-tidy : disable warning about performance enum size (ggml-org#16127)
ggml : implement set_rows with i32 index (ggml-org#16159)
codeowners : update + cleanup (ggml-org#16174)
common : enable `--offline` mode without curl support (ggml-org#16137)
webui : fix handling incomplete chunks (ggml-org#16107)
embedding : fix typos in README (ggml-org#16171)
common : remove unused local variables (ggml-org#16140)
ggml : extend ggml_can_fuse to work with non-sequential nodes (ggml-org#16123)
ggml : add ggml_op_is_empty (ggml-org#16122)
codeowners : update ownership for @ngxson and @allozuar (ggml-org#16128)
Vulkan: add conv_transpose_2d operation (ggml-org#16022)
...
struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
…l-org#15928)

* fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl

* change initialization to true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants