`quantize`: add imatrix and dataset metadata in GGUF #6658

phymbert · 2024-04-13T13:00:04Z

Context

In the context of:

quantize: add imatrix and dataset metadata in GGUF #6656

Add imatrix related metadata in quantum models.

Changes

quantize: add imatrix n entries, n_chunks, and dataset KV metadata
common: factorize KV Overrides parsing between common, server and quantize
quantize is now linked against common
llama: support kv overrides type string
imatrix: save the dataset file used in the output file

Tests

Convert to GGUF (PHI-2)

pip install huggingface_hub
nohup python -c 'from huggingface_hub import snapshot_download; snapshot_download(repo_id="microsoft/phi-2", local_dir="models/phi-2")' > phi-2_download.log &
tail -f phi-2_download.log

./convert-hf-to-gguf.py models/phi-2 --outfile models/phi-2-f16.gguf --outtype f16

Compute the importance matrix

./scripts/get-wikitext-2.sh
./build/bin/imatrix \
  --model models/phi-2-f16.gguf \
  -f wikitext-2-raw/wiki.train.raw \
  -o phi-2-f16.imatrix \
  -ngl 33 \
  --seed 42 \
  --chatml \
  --chunks 20

Quantize with the imatrix

./build/bin/quantize \
  --imatrix phi-2-f16.imatrix \
  models/phi-2-f16.gguf  \
  models/phi-2-q4_k_m.gguf \
 q4_k_m \
  --override-kv my_metadata=str:best-quantum-model-ever

See the new metadata

./gguf-py/scripts/gguf-dump.py models/phi-2-q4_k_m.gguf
     23: UINT32     |        1 | general.quantization_version = 2
     24: STRING     |        1 | my_metadata = 'best-quantum-model-ever'
     25: STRING     |        1 | quantize.imatrix.file = 'imatrix-f16.imatrix'
     26: STRING     |        1 | quantize.imatrix.dataset = 'wikitext-2-raw/wiki.train.raw'
     27: INT32      |        1 | quantize.imatrix.entries_count = 192
     28: INT32      |        1 | quantize.imatrix.chunks_count = 20

./build/bin/main 
  --model .models/phi-2-q4_k_m.gguf \
  -ngl 33 \
 --random-prompt \
  --override-kv my_metadata_2=str:best-quantum-model-ever-2

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
...
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - kv  20:                                my_metadata str              = best-quantum-model-ever
llama_model_loader: - kv  21:                      quantize.imatrix.file str              = imatrix-f16.imatrix
llama_model_loader: - kv  22:                   quantize.imatrix.dataset str              = wikitext-2-raw/wiki.train.raw
llama_model_loader: - kv  23:             quantize.imatrix.entries_count i32              = 192
llama_model_loader: - kv  24:              quantize.imatrix.chunks_count i32              = 20

Test no regression on the server

./build/bin/server \
  --model ../llama.cpp/models/phi-2-q4_k_m.gguf \
  -ngl 33 \
  --override-kv my_metadata_2=str:best-quantum-model-ever-2

Closes #6656

quantize: factorize KV Overrides parsing between common #6656

…pile on some toolchain

common/common.cpp

llama.cpp

llama.h

phymbert · 2024-04-13T15:02:36Z

We might also add the number of chunks the imatrix was computed with

phymbert · 2024-04-16T06:02:40Z

@ggerganov, is this general approach relevant ?

llama.h

…ntize/imatrix-metadata

llama.cpp

common: free kv override if used after model loading

…ntize/imatrix-metadata

llama.h

phymbert · 2024-04-21T18:26:24Z

@slaren, can you please have a second check and merge it if approved

slaren

I also realized that llama_model_quantize_params::kv_overrides is a pointer to a std::vector for no reason whatsoever. It would be great if that could be fixed as well.

common/common.cpp

examples/quantize/quantize.cpp

…ed from a pair of iterators. Co-authored-by: slaren <slarengh@gmail.com>

…ntize/imatrix-metadata

slaren

We should still need to change llama_model_quantize_params::kv_overrides to be a pointer to llama_model_kv_override rather than a std::vector, but it can be done in other PR.

schmorp · 2024-04-27T07:47:42Z

While I appreciate adding this metadata, I think there is a privacy concern here - how about only storing the filename and not the complete path (which might leak sensitive data such as the username).

phymbert · 2024-04-27T09:26:34Z

Good point. Meanwhile, you can use kv overrides.

phymbert added 4 commits April 13, 2024 13:53

imatrix: save the dataset file used in the output file

d766c5a

llama: support kv overrides type string string

01e7795

common: factorize KV Overrides parsing between common and server

a9202fb

quantize: add imatrix n entries and dataset KV metadata

262c95a

quantize: factorize KV Overrides parsing between common #6656

phymbert added generation quality Quality of model output Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024

phymbert requested a review from ggerganov April 13, 2024 13:00

llama: remove kv override str_value initialization as it does not com…

cbc43aa

…pile on some toolchain

phymbert requested a review from slaren April 13, 2024 13:12

phymbert commented Apr 13, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

phymbert commented Apr 13, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

phymbert commented Apr 13, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

phymbert added 2 commits April 13, 2024 18:03

quantize: add imatrix m_last_call as quantize.imatrix.chunks_count

851de16

quantize: add imatrix filename in KV

0d82da6

phymbert added need feedback Testing and feedback with results are needed model Model specific and removed Less than 4 bits Efforts related to viable quantized models using <4 bits labels Apr 13, 2024

phymbert mentioned this pull request Apr 13, 2024

model: support arch DbrxForCausalLM #6515

Merged

13 tasks

ggerganov reviewed Apr 16, 2024

View reviewed changes

llama.h Show resolved Hide resolved

phymbert added 2 commits April 19, 2024 12:45

Merge remote-tracking branch 'refs/remotes/origin/master' into hp/qua…

ea0ad80

…ntize/imatrix-metadata

llama: add llama_model_kv_override_free

82e4187

phymbert commented Apr 19, 2024

View reviewed changes

llama.cpp Outdated Show resolved Hide resolved

phymbert marked this pull request as draft April 19, 2024 20:08

common: add llama_model_kv_override_free

aa0e28f

common: free kv override if used after model loading

phymbert marked this pull request as ready for review April 20, 2024 08:17

phymbert requested a review from ggerganov April 20, 2024 08:17

Merge remote-tracking branch 'refs/remotes/origin/master' into hp/qua…

2606bc9

…ntize/imatrix-metadata

phymbert requested a review from Jeximo April 20, 2024 08:21

This comment was marked as off-topic.

Sign in to view

ggerganov reviewed Apr 20, 2024

View reviewed changes

llama.h Outdated Show resolved Hide resolved

llama: finally move the string KV override value to the stack

4bd2664

phymbert marked this pull request as draft April 20, 2024 11:59

llama : minor

5cf8ccb

ggerganov approved these changes Apr 21, 2024

View reviewed changes

phymbert marked this pull request as ready for review April 21, 2024 18:24

slaren reviewed Apr 21, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

examples/quantize/quantize.cpp Outdated Show resolved Hide resolved

no need to add a NUL to the std::vector, std::string can be initializ…

8360e0c

…ed from a pair of iterators. Co-authored-by: slaren <slarengh@gmail.com>

phymbert marked this pull request as draft April 21, 2024 19:00

phymbert added 2 commits April 26, 2024 12:07

Merge remote-tracking branch 'refs/remotes/origin/master' into hp/qua…

b54eede

…ntize/imatrix-metadata

kv override: ensure string termination

bcbdd28

phymbert marked this pull request as ready for review April 26, 2024 10:11

phymbert requested a review from slaren April 26, 2024 10:12

slaren approved these changes Apr 26, 2024

View reviewed changes

phymbert merged commit 0c4d489 into master Apr 26, 2024

phymbert deleted the hp/quantize/imatrix-metadata branch April 26, 2024 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`quantize`: add imatrix and dataset metadata in GGUF #6658

`quantize`: add imatrix and dataset metadata in GGUF #6658

Uh oh!

phymbert commented Apr 13, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

phymbert commented Apr 13, 2024

Uh oh!

phymbert commented Apr 16, 2024

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

phymbert commented Apr 21, 2024

Uh oh!

slaren left a comment

Uh oh!

Uh oh!

Uh oh!

slaren left a comment

Uh oh!

schmorp commented Apr 27, 2024

Uh oh!

phymbert commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

quantize: add imatrix and dataset metadata in GGUF #6658

quantize: add imatrix and dataset metadata in GGUF #6658

Uh oh!

Conversation

phymbert commented Apr 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

phymbert commented Apr 13, 2024

Uh oh!

phymbert commented Apr 16, 2024

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

phymbert commented Apr 21, 2024

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

slaren left a comment

Choose a reason for hiding this comment

Uh oh!

schmorp commented Apr 27, 2024

Uh oh!

phymbert commented Apr 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

`quantize`: add imatrix and dataset metadata in GGUF #6658

`quantize`: add imatrix and dataset metadata in GGUF #6658

phymbert commented Apr 13, 2024 •

edited

Loading