Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : quantization refactoring #3833

Merged
merged 3 commits into from
Oct 29, 2023
Merged

ggml : quantization refactoring #3833

merged 3 commits into from
Oct 29, 2023

Conversation

ggerganov
Copy link
Owner

Moving all CPU quantization code into separate source files.
This is a refactoring change - there should be no functional difference.

  • Rename k_quants.h/.c -> ggml-quants.h/.c
  • Move qunatization code from ggml.c into ggml-quants.c
  • Remove GGML_USE_K_QUANTS ifdefs (i.e. always build with K-quants support)

In the future, ggml-quants.h should probably be moved into a ggml-impl.h private API header as discussed in ggerganov/ggml#549

@ggerganov ggerganov changed the title Ggml quants ggml : quantization refactoring Oct 28, 2023
@cebtenzzre
Copy link
Collaborator

Building without GGML_USE_K_QUANTS is currently the only way to quantize a pure Q4_0 model, which has been helpful while developing the Nomic Vulkan backend, as we run the whole model on the GPU but did not have a Q6_K matmul shader at first.

Maybe we should add a flag to quantize to disable the k-quants logic?

@ggerganov
Copy link
Owner Author

Maybe we should add a flag to quantize to disable the k-quants logic?

Yes. Do you want to add it to this PR?

@cebtenzzre
Copy link
Collaborator

Yes. Do you want to add it to this PR?

Done.

@ggerganov ggerganov merged commit d69d777 into master Oct 29, 2023
33 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Oct 30, 2023
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants