ggml : quantization refactoring #3833

ggerganov · 2023-10-28T14:25:04Z

Moving all CPU quantization code into separate source files.
This is a refactoring change - there should be no functional difference.

Rename k_quants.h/.c -> ggml-quants.h/.c
Move qunatization code from ggml.c into ggml-quants.c
Remove GGML_USE_K_QUANTS ifdefs (i.e. always build with K-quants support)

In the future, ggml-quants.h should probably be moved into a ggml-impl.h private API header as discussed in ggerganov/ggml#549

ggml-ci

cebtenzzre · 2023-10-28T16:45:34Z

Building without GGML_USE_K_QUANTS is currently the only way to quantize a pure Q4_0 model, which has been helpful while developing the Nomic Vulkan backend, as we run the whole model on the GPU but did not have a Q6_K matmul shader at first.

Maybe we should add a flag to quantize to disable the k-quants logic?

ggerganov · 2023-10-28T16:57:51Z

Maybe we should add a flag to quantize to disable the k-quants logic?

Yes. Do you want to add it to this PR?

cebtenzzre · 2023-10-28T20:35:36Z

Yes. Do you want to add it to this PR?

Done.

* ggml : factor all quantization code in ggml-quants ggml-ci * ggml-quants : fix Zig and Swift builds + quantize tool ggml-ci * quantize : --pure option for disabling k-quant mixtures --------- Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>

ggerganov added 2 commits October 28, 2023 17:14

ggml : factor all quantization code in ggml-quants

3412be7

ggml-ci

ggml-quants : fix Zig and Swift builds + quantize tool

ee37e35

ggml-ci

ggerganov changed the title ~~Ggml quants~~ ggml : quantization refactoring Oct 28, 2023

quantize : --pure option for disabling k-quant mixtures

8a86b95

cebtenzzre force-pushed the ggml-quants branch from 2ed9ad3 to 8a86b95 Compare October 28, 2023 20:37

ggerganov merged commit d69d777 into master Oct 29, 2023
33 checks passed

This was referenced Oct 30, 2023

build : enable link-time optimizations #3859

Closed

ggml : move FP16 <-> FP32 stuff to ggml-impl.h #3861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : quantization refactoring #3833

ggml : quantization refactoring #3833

ggerganov commented Oct 28, 2023

cebtenzzre commented Oct 28, 2023

ggerganov commented Oct 28, 2023

cebtenzzre commented Oct 28, 2023

ggml : quantization refactoring #3833

ggml : quantization refactoring #3833

Conversation

ggerganov commented Oct 28, 2023

cebtenzzre commented Oct 28, 2023

ggerganov commented Oct 28, 2023

cebtenzzre commented Oct 28, 2023